Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning for Systems

On the Promise and Challenges of Foundation Models for Learning-based Cloud Systems Management

Haoran Qiu · Weichao Mao · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Tamer Basar · Ravishankar Iyer

[ ] [ Project Page ]
 
presentation: Machine Learning for Systems
Sat 16 Dec 7 a.m. PST — 3 p.m. PST

Abstract:

Foundation models (FMs) are machine learning models that are trained broadly on large-scale data and can be adapted to a set of downstream tasks via fine-tuning, few-shot learning, or even zero-shot learning. Despite the successes of FMs in the language and vision domain, we have yet to see an attempt to develop FMs for cloud systems management (or known as cloud intelligence/AIOps). In this work, we explore the opportunities of developing FMs for cloud systems management. We propose an initial FM design (i.e., the FLASH framework) based on meta-learning and demonstrate its usage in the task of resource configuration search and workload autoscaling. Preliminary results show that FLASH achieves 52.3-90.5% less performance degradation with no adaptation and provides 5.5x faster adaptation. We conclude this paper by discussing the unique risks and challenges of developing FMs for cloud systems management.

Chat is not available.