ICPP 2023 • 2023

FaST-GShare: Enabling efficient spatio-temporal GPU sharing in serverless computing for deep learning inference

Authors: J. Gu, Y. Zhu, P. Wang, M. Chadha, M. Gerndt

Enable fine-grained spatio-temporal GPU sharing for DL inference in serverless computing.

Euro-Par 2025 • 2025

HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences

Authors: J. Gu, P. Wang, I. Nunez, K. Huang, M. Gerndt