5th International Workshop on OMNeT++

Research Article

Towards Massively Parallel Simulations of Massively Parallel High-Performance Computing Systems

  • @INPROCEEDINGS{10.4108/icst.simutools.2012.247685,
        author={Robert Birke and German Rodriguez and Cyriel Minkenberg},
        title={Towards Massively Parallel Simulations of Massively Parallel High-Performance Computing Systems},
        proceedings={5th International Workshop on OMNeT++},
        publisher={ACM},
        proceedings_a={OMNET++},
        year={2012},
        month={6},
        keywords={high-performance computing parallel distributed event simulation interconnection networks},
        doi={10.4108/icst.simutools.2012.247685}
    }
    
  • Robert Birke
    German Rodriguez
    Cyriel Minkenberg
    Year: 2012
    Towards Massively Parallel Simulations of Massively Parallel High-Performance Computing Systems
    OMNET++
    ACM
    DOI: 10.4108/icst.simutools.2012.247685
Robert Birke1, German Rodriguez1, Cyriel Minkenberg1,*
  • 1: IBM Research - Zurich
*Contact email: cyriel@hispeed.ch

Abstract

The power of high-performance computing (HPC) is applied to simulate highly complex systems and processes in many scientific communities, e.g. in particle physics, weather and climate research, bio-sciences, materials science, pharmaceutics, astronomy, or finance. Current HPC systems are so complex that the design of such a system, including architecture design space exploration and performance prediction, requires HPC-like simulation capabilities. To this end, we developed an Omnest-based simulation environment that enables studying the impact of an HPC machine’s communication subsystem on the overall system’s performance for specific workloads. As the scale of current high-end HPC systems is in the range of hundreds of thousands of processing cores, full system simulation—at an abstraction level that still maintains a reasonably high level of detail—is infeasible without resorting to parallel simulation, the main limiting factors being simulation run time and memory footprint. We describe our experiences in adapting our simulation environment to take advantage of the parallel distributed simulation capabilities provided by Omnest. We present results obtained on a many-core SMP machine as well as a small-scale InfiniBand cluster. Furthermore, we ported our simulation environment, including Omnest itself, to the massively parallel IBM Blue Gene/P platform. We report results from initial experiments on this platform using up to 512 cores in parallel.