Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Systems

PARM: Adaptive Resource Allocation for Datacenter Power Capping

Haoran Qiu · Linghao Zhang · Chen Wang · Hubertus Franke · Zbigniew Kalbarczyk · Ravishankar Iyer


Energy efficiency is pressing in today's cloud datacenters. Various power management strategies, such as oversubscription, power capping, and dynamic voltage and frequency scaling, have been proposed and are in use by datacenter operators to better control power consumption at any management unit (e.g., node-level or rack-level) without breaking power budgets. In addition, by gaining more control over different management units within a datacenter (or across datacenters), operators are able to shift the energy consumption either spatially or temporally to optimize carbon footprint based on the spatio-temporal patterns of carbon intensity. The drive for automation has resulted in the exploration of learning-based resource management approaches. In this work, we first systematically investigate the impact of power capping on both latency-critical datacenter workloads and learning-based resource management solutions (i.e., reinforcement learning or RL). We show that even a 20% reduction in power limit (power capping) leads to an 18% degradation in resource management effectiveness (i.e., defined by an RL reward function) which causes 50% higher application latency. We then propose PARM, an adaptive resource allocation framework that provides graceful performance-preserving transition under power capping for latency-critical workloads. Evaluation results show that PARM achieves 10.2-99.3% improvement in service-level objective (SLO) preservation under power capping while improving 3.1-5.8% utilization.

Chat is not available.