Serving Machine Learning Inference Using Heterogeneous Hardware

Baolin Li, Vijay Gadepally, Siddharth Samsi, Mark Veillette, Devesh Tiwari

September 2021

Abstract

The growing popularity of machine learning algorithms and the wide availability of hardware accelerators have brought up new challenges on inference serving. This paper explores the opportunity to serve inference queries with a heterogeneous system. The system has a central optimizer that allocates heterogeneous hardware resources to cooperatively serve queries. The optimizer supports both energy minimization and throughput maximization while satisfying a latency target. The optimized heterogeneous serving system is evaluated against a homogeneous system, on two representative real-world applications of radar nowcasting and object detection. Our evaluation results show that the power-optimized heterogeneous system can achieve up to 36% of power saving, and the throughput-optimized heterogeneous system can increase query throughput by up to 53%.

Type

Conference paper

Publication

In Proceedings of 2021 IEEE High Performance Extreme Computing Conference (HPEC)

Serving Machine Learning Inference Using Heterogeneous Hardware

Abstract

Baolin Li

Ph.D.