Fifth International Conference on Simulation Tools and Techniques

Research Article

SST + gem5 = A Scalable Simulation Infrastructure for High Performance Computing

Download607 downloads
  • @INPROCEEDINGS{10.4108/icst.simutools.2012.247745,
        author={Mingyu Hsieh and Jie Meng and Michael Levenhagen and Kevin Pedretti and Ayse Coskun and Arun Rodrigues},
        title={SST + gem5 = A Scalable Simulation Infrastructure for High Performance Computing},
        proceedings={Fifth International Conference on Simulation Tools and Techniques},
        publisher={ICST},
        proceedings_a={SIMUTOOLS},
        year={2012},
        month={6},
        keywords={simulation architecture},
        doi={10.4108/icst.simutools.2012.247745}
    }
    
  • Mingyu Hsieh
    Jie Meng
    Michael Levenhagen
    Kevin Pedretti
    Ayse Coskun
    Arun Rodrigues
    Year: 2012
    SST + gem5 = A Scalable Simulation Infrastructure for High Performance Computing
    SIMUTOOLS
    ICST
    DOI: 10.4108/icst.simutools.2012.247745
Mingyu Hsieh1,*, Jie Meng2, Michael Levenhagen1, Kevin Pedretti1, Ayse Coskun2, Arun Rodrigues1
  • 1: Sandia National Labs
  • 2: Boston University
*Contact email: myhsieh@sandia.gov

Abstract

High Performance Computing (HPC) faces new challenges in scalability, performance, reliability, and power consumption. Solving these challenges will require radically new hardware and software approaches. It is impractical to explore this vast design space without detailed system-level simulations at some scale. However, most of the existing simulators are either not sufficiently detailed, not scalable, or cannot evaluate key system characteristics such as energy consumption or reliability. To address this problem, we integrate the highly detailed gem5 simulator into the parallel Structural Simulation Toolkit (SST). We add the fast-forward capability in the SST/gem5 and ported the lightweight Kitten operating system on gem5. In addition, we improve the reliability model in SST with more comprehensive analysis of system reliability. Utilizing the simulation framework, we evaluate the impact of two energy-efficient resource-conscious scheduling policies on system reliability. Results show that the effectiveness of scheduling policies differ according to the composition of workload and system topology.