RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances

Nov 1, 2021·

Baolin Li

Rohan Basu Roy

Tirthak Patel

Vijay Gadepally

Karen Gettings

Devesh Tiwari

· 0 min read

PDF Cite Code Slides Video DOI

Abstract

Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces RIBBON, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind RIBBON is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. RIBBON devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms – and, RIBBON demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. RIBBON saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.

Type

Conference paper

Publication

In Proceedings of the 2021 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

Last updated on Jun 17, 2024

Authors

Baolin Li

Ph.D.

← Great Power, Great Respobsibility: Recommendations for Reducing Energy for Training Language Models Jan 2, 2022