«Kirk W. Cameron, Virginia Tech Kirk Pruhs, University of Pittsburgh Sandy Irani, University of California, Irvine Partha Ranganathan, Hewlett-Packard ...»
Report of the Science of Power Management Workshop
April 9-10, 2009
Kirk W. Cameron, Virginia Tech
Kirk Pruhs, University of Pittsburgh
Sandy Irani, University of California, Irvine
Partha Ranganathan, Hewlett-Packard
David Brooks, Harvard University
This workshop was sponsored by the National Science Foundation (www.nsf.gov). The views expressed
in this report are those of the individual participants and are not necessarily those of their respective
organizations or the workshop sponsor.
Version 1.0 Technical Report No.
VT/CS-09-19 August 31, 2009 Version 1.0 Technical Report No. VT/CS-09-19 August 31, 2009 2 Table of Contents Preface 3 Executive Summary 4 Background 4 Key Findings & Recommendations to NSF 6 Conclusions 9 Appendix A: Organizing and Steering Committee 10 Appendix B: List of Attendees 10 Appendix C: Detailed Reports by Break-Out Group 12 Software 12 Data Centers 16 Hardware 22 Networks 26 Storage 30 Physicals 35 Version 1.0 Technical Report No.
VT/CS-09-19 August 31, 2009 3 “The energy used by the nation’s servers and data centers is significant...more than the electricity consumed by the nation’s color televisions and similar to the amount of electricity consumed by approximately 5.8 million average U.S.
households.” EPA Report to Congress on Server and Data Center Energy Efficiency.
In response to Public Law 109-431, August 2, 2007.
Preface A number of reports in the past several years have questioned the sustainability of the computing infrastructure of the United States. Reports by the U.S. EPA and others have concluded that in order for the U.S. to maintain its competitiveness the power and energy consumption challenges facing our IT infrastructure must be addressed.
Power consumption of IT equipment begins with the design of a microchip and continues across the traditional technological boundaries of system design integration and design of the facilities that house them. Since most techniques have been developed in isolation, there are serious gaps in our understanding of the “science” behind these complex systems across and within these boundaries.
In recognition of recent developments, NSF sponsored a Workshop on the Science of Power Management on April 9-10 in Arlington, Virginia. The intent of the workshop was to bring together leading thinkers in the area of power and thermal management from chips to systems to facilities and integrate them with algorithm and theory experts to identify, prioritize and recommend promising research directions in the hope of incubating development of a science of power management.
The format of the workshop was a series of keynote talks from industrial experts and academic leaders followed by breakout sessions focusing on software, hardware, networks, storage, and physicals. Break out groups met twice and group leaders presented their findings to the committee and attendees to close the workshop. The steering committee was tasked with authoring this report and releasing it to the public upon delivery to NSF.
This document contains an executive summary of the key findings of the workshop and the key recommendations for future research to support the development of a science of power management. This workshop would not have been possible without the hard work and diligence of the breakout group leaders and the workshop attendees. We would also like to thank the steering committee members for the additional time and effort they volunteered despite their intense schedules. And finally, a word of thanks to the National Science Foundation for sponsoring this workshop without which this report would have been impossible.
Kirk W. Cameron, Virginia Tech Kirk Pruhs, University of Pittsburgh Krishna Kant, NSF
Executive Summary We believe that there is a need for a consolidated effort to establish a Science of Power Management, or comprehensive set of principles and techniques that provide practical solutions to the power issues facing the information technology community.
There is clear consensus that one of the most important grand challenges facing humanity in the next century is to develop technologies that will allow us to continue advancement in a sustainable manner. There is increased scrutiny on the national and international stage for the United States to curb its energy use and thus carbon emissions. Towards this end, recent studies by the US EPA and Department of Energy have concluded that in particular more effort is needed to curb the power consumption of data centers. With the recent election of Barack Obama, the US is more likely to sign a version of the Kyoto treaty that commits the US to reduce emissions further. Currently IT devices consume about as much energy and produce about as much carbon dioxide as the airline industry. However, because use of IT technology is still growing exponentially (centralized deployments of enterprise volume servers in data centers are growing 12% annually), and because energy and power have not traditionally been first order design constraints for IT technology, improvements in the energy efficiency of IT devices will be much more dramatic, and eventually have much greater impact than in other areas of technology, such as aircraft technology. Some progress is already being made towards these goals. For example, the IT industry has formed groups such as the Green Grid and SPECpower aimed at self-regulation through establishment of best practices for energy efficient data centers.
While it is important to address power management issues in every aspect of IT use, improving the energy efficiency of large data centers is a particularly critical need. If the power consumption of data centers goes unchecked, the sustainability of our national computing infrastructure is in question. These servers support the electronic infrastructure critical to enterprise use for businesses, e-commerce and the Internet. Power consumption and heat production lead to increased cost and reduced reliability in current data centers which in turn amplifies the need to build more.
There has then been a dramatic growth recently in the scope and diversity of research addressing power management from chip and data center design to facility management. Nearly every major conference across the discipline of computer science includes sessions related to power and thermal management from computational theory to compilers and systems to software engineering. New workshops, conferences, and special issue journals are emerging that solicit papers on power and thermal management. Professional magazines such as IEEE Computer have created ongoing columns related to greener computing including power and thermal management. Architects investigate energy-efficient microarchitectures, system researchers investigate operating system power scheduling policies, and thermal engineers investigate cooling systems to address server density issues. However, experts in each of these areas are often isolated from each other which make collaboration difficult. Currently power and thermal management techniques are generally designed in isolation for each type of device, such as clock gating on chips, power state management of a laptop, and load management across servers. The Version 1.0 Technical Report No.
VT/CS-09-19 August 31, 2009 5 lack of coordination among techniques can cause missed power management opportunities as well as power management policies that conflict with one another.
The time is appropriate to consider a “science” of power management. Quoting Webster’s dictionary, “science is knowledge or a system of knowledge covering general truths”. For example, the main purpose of the science of computers (i.e. computer science) is to establish a framework to reason about computation, and to develop a collection of techniques that are applicable in solving a broad range of computational problems. Some examples of techniques developed by computer scientists that have found uses in a variety of settings include hashing, public-key cryptography, and latency hiding with predictive prefetching. Analogously, a science of power management would establish a framework to reason about power/energy/temperature, and to develop a collection of widely applicable power management techniques. It should be emphasized that we can not reason about energy usage in isolation in the same way that we can formally reason about time in isolation as a computational resource. One interpretation of the Church-Turing thesis is that physical laws impose lower bounds on the time required to solve certain problems; as all computation can be made reversible, it seems that physical laws do not impose any inherent lower bound on the energy required to solve any problem. Therefore, the energy characteristics of
models will have to be based on characteristics of current and conceivable technologies, not on physical laws. Of course, we also ask that new models be robust to changes in technology so that principles and techniques will still be applicable even as the specific parameters of computer systems evolve over time. Furthermore metrics used to evaluate algorithms will necessarily incorporate trade-offs between energy, time, communication, etc. A formal framework would ideally enable algorithm developers to design algorithms that balance the use of CPU cycles, communication and memory to optimize energy usage and performance in completing a specific task. For example, under a particular model one might hope to identify what level of compression optimally trades off the energy savings of communication with the additional energy costs for compression and decompression at the end points.
While algorithms designers are concerned with solving specific problems in a way that minimizes the use of limited resources, systems designers must devise policies that take a set of tasks whose resource needs are (at least partially) determined and decide how to allocate an ensemble of resources among these tasks. Energy-aware policies will make use of flexibility in load balancing, processor speed, sleep states and other tunable parameters to allocate different resources within a system. In order to formally reason about these tradeoffs, theoretical models will need to balance the requirements of different tasks as well as the availability of resources just as, for example, abstract models for parallel computation incorporate resources such as CPU cycles, communication and storage. A formal framework of this kind would also be useful in designing the optimal distribution of components at design time given some knowledge of the expected workload of the system, and some specified balance between performance and energy conservation. These problems will be especially challenging because systems designed to conserve energy will likely be composed of resources with heterogeneous characteristics. For example, systems may consist of high power and high performance resources for critical tasks and lower power and performance resources for noncritical tasks. Past theoretical research on resource management in heterogeneous systems has not considered the energy characteristics of Version 1.0 Technical Report No.
VT/CS-09-19 August 31, 2009 6 resources. Given that energy usage has significantly different mathematical characteristics than time and space, a new theory of energy-heterogeneous systems is needed.
Key Findings & Recommendations to NSF:
Vision for a Science of Power Management Finding #1: The need for further scientific observation.