Minimizing GPU Cost for Your Deep Learning on Kubernetes - Kai Zhang & Yang Che, Alibaba
Stop Wasting GPU Flops on Cold Starts: High Performance Inference with Model Streamer - AI Eng Paris
GPU Sharing for Machine Learning Workload on Kubernetes - Henry Zhang & Yang Yu, VMware
Most devs don't understand how LLM tokens work
GKE Cost Optimization Golden Signals: Workload Rightsizing
USENIX ATC '22 - Serving Heterogeneous Machine Learning Models on Multi-GPU Servers...
Using Kubernetes to Offer Scalable Deep Learning on Alibaba Cloud - Kai Zhang & Yang Che, Alibaba
Scaling AI Inference Workloads with GPUs and Kubernetes - Renaud Gaubert & Ryan Olson, NVIDIA
Kubernetes Cost Optimization That You Do NOT Know
Reduce GPU Costs for AI
The KV Cache: Memory Usage in Transformers
How to Optimize GPU Scheduling in Kubernetes
Production GPU Cluster with K8s for AI and DL Workloads - Madhukar Korupolu, NVIDIA
LLM Inference: A Comparative Guide to Modern Open-Source Runtimes | Aleksandr Shirokov, Wildberries
I replaced my entire tech stack with Postgres...
AI workloads on Kubernetes - How to maximize GPU utilization and cut costs
Lightning Talk: Managing Drivers in a Kubernetes Cluster - Renaud Gaubert, NVIDIA
Deploy and Scale AI Workloads with NVIDIA Run:ai on Azure Kubernetes Service (AKS)
The Path to GPU as a Service in Kubernetes - Renaud Gaubert, NVIDIA (Intermediate Skill Level)
Automating GPU Infrastructure for Kubernetes - Lucas Servén Marín, CoreOS