2nd International IEEE Conference on Communication System Software and Middleware

Research Article

Software Architecture for Dynamic Thermal Management in Datacenters

  • @INPROCEEDINGS{10.1109/COMSWA.2007.382430,
        author={Tridib Mukherjee and Qinghui Tang and Corbett  Ziesman and Sandeep K. S. Gupta and Phil Cayton},
        title={Software Architecture for Dynamic Thermal Management in Datacenters},
        proceedings={2nd International IEEE Conference on Communication System Software and Middleware},
        publisher={IEEE},
        proceedings_a={COMSWARE},
        year={2007},
        month={7},
        keywords={Computer architecture  Corporate acquisitions  Cost function  Power system management  Processor scheduling  Resource management  Sensor systems  Software architecture  Thermal management  Thermal sensors},
        doi={10.1109/COMSWA.2007.382430}
    }
    
  • Tridib Mukherjee
    Qinghui Tang
    Corbett Ziesman
    Sandeep K. S. Gupta
    Phil Cayton
    Year: 2007
    Software Architecture for Dynamic Thermal Management in Datacenters
    COMSWARE
    IEEE
    DOI: 10.1109/COMSWA.2007.382430
Tridib Mukherjee1, Qinghui Tang1, Corbett Ziesman1, Sandeep K. S. Gupta1, Phil Cayton2
  • 1: Arizona State University, Tempe, AZ 85287
  • 2: Intel Corporation, Hillsboro, Oregon.

Abstract

Minimizing the energy cost and improving thermal performance of power-limited datacenters, deploying large computing clusters, are the key issues towards optimizing their computing resources and maximally exploiting the computation capabilities. In this paper, we develop a unique merger between the physical infrastructure and resource management functions of a cluster management system to take a holistic view of datacenter management, and make global (at the level of a datacenter) thermal-aware job scheduling decisions. A software architecture is presented in this regard and implemented in a fully operational computational cluster in the ASU datacenter. The proposed architecture develops a feedback-control loop, by combining information from ambient and on-board sensors with the node allocation and job scheduling mechanisms, for managing the system load depending on the thermal distribution in the datacenter.