Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing - Dan Sun
Serverless Machine Learning Inference with KFServing - Clive Cox, Seldon & Yuzhui Liu, Bloomberg
The Software GPU: Making Inference Scale in the Real World
Auto-scaling Hardware-agnostic ML Inference with NVIDIA Triton and Arm NN
Seldon Deploy and KFServing: Serverless Deployment of Machine Learning Models
Building Machine Learning Inference Through Knative Serverless...- Shivay Lamba & Rishit Dagli
Are You Really Out of GPUs? How to Better Understand Your GPU... - Natasha Romm & Raz Rotenberg
Kubeflow inference on knative — Dan Sun, Bloomberg
Visualizing Concurrency Bugs on GPUs
Auto Scaling GPU Based ML Workloads to 2 million+ requests per day on HashiCorp Stack
Vertical Autoscaling of GPU Resources for Machine Learning in the Cloud
Piotr Wojciechowski: Inference optimization techniques
How to Deploy Models at Scale with GPUs | TransformX 2022
What is KFserving?
Bristech MLOps: Clive Cox - ML Serving with KFServing (Sept 2020)
Auto Scaling GPU Based ML Workloads to 2Mn+ req/day on HashiCorp Stack
Accelerate Federated Learning Model Deployment with KServe (KFServing) - Fangchi Wang & Jiahao Chen
Optimizing Inference for Neural Machine Translation using Sockeye 2
Serving Machine Learning Models at Scale Using KServing - Animesh Singh, IBM
GPU as a Service Over K8s: Drive Productivity and Increase Utilization - Yaron Haviv, Iguazio