Timezone: »

Managing Power Consumption and Performance of Computing Systems Using Reinforcement Learning
Gerald Tesauro · Rajarshi Das · Hoi Chan · Jeffrey O Kephart · David Levine · Freeman Rawson · Charles Lefurgy

Tue Dec 04 05:20 PM -- 05:30 PM (PST) @
Electrical power management in large-scale IT systems such as commercial datacenters is an application area of rapidly growing interest from both an economic and ecological perspective, with billions of dollars and millions of metric tons of CO$_2$ emissions at stake annually. Businesses want to save power without sacrificing performance. This paper presents a reinforcement learning approach to simultaneous online management of both performance and power consumption. We apply RL in a realistic laboratory testbed using a Blade cluster and dynamically varying HTTP workload running on a commercial web applications middleware platform. We embed a CPU frequency controller in the Blade servers' firmware, and we train policies for this controller using a multi-criteria reward signal depending on both application performance and CPU power consumption. Our testbed scenario posed a number of challenges to successful use of RL, including multiple disparate reward functions, limited decision sampling rates, and pathologies arising when using multiple sensor readings as state variables. We describe innovative practical solutions to these challenges, and demonstrate clear performance improvements over both hand-designed policies as well as obvious ``cookbook'' RL implementations.

Author Information

Gerald Tesauro (IBM TJ Watson Research Center)
Rajarshi Das (IBM Research)
Hoi Chan
Jeffrey O Kephart (IBM Research)
David Levine
Freeman Rawson
Charles Lefurgy (IBM)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors