ue 16(10): e4

Research Article

GPU Performance Prediction Through Parallel Discrete Event Simulation and Common Sense

Download164 downloads
  • @ARTICLE{10.4108/eai.14-12-2015.2262575,
        author={Guillaume Chapuis and Stephan Eidenbenz and Nandakishore Santhi},
        title={GPU Performance Prediction Through Parallel Discrete Event Simulation and Common Sense},
        journal={EAI Endorsed Transactions on Future Internet},
        volume={3},
        number={10},
        publisher={ACM},
        journal_a={UE},
        year={2016},
        month={1},
        keywords={parallel discrete event simulation, gpgpu, performance prediction},
        doi={10.4108/eai.14-12-2015.2262575}
    }
    
  • Guillaume Chapuis
    Stephan Eidenbenz
    Nandakishore Santhi
    Year: 2016
    GPU Performance Prediction Through Parallel Discrete Event Simulation and Common Sense
    UE
    EAI
    DOI: 10.4108/eai.14-12-2015.2262575
Guillaume Chapuis1, Stephan Eidenbenz1,*, Nandakishore Santhi1
  • 1: LANL
*Contact email: eidenben@lanl.gov

Abstract

We present the GPU Module of a Performance Prediction Toolkit developed at Los Alamos National Laboratory, which enables code developers to efficiently test novel algorithmic ideas particularly for large-scale computational physics codes. The GPU Module is a heavily-parameterized model of the GPU hardware that takes as input a sequence of abstracted instructions that the user provides as a representation of the application or can also be read in from the GPU intermediate representation PTX format. These instructions are then executed in a discrete event simulation framework of the entire computing infrastructure that can include multi-GPU and also multi-node components as typically found in high performance computing applications. Our GPU Module aims at a trade-off between the cycle-accuracy of GPU simulators and the fast execution times of analytical models. This trade-off is achieved by simulating at cycle level only a portion of the computations and using this partial runtime to analytically predict the total execution of the modeled application. We present GPU models that we validate against three different benchmark applications that cover the range from bandwidth- to cycle-limited. Our runtime predictions are within an error of 20%. We then predict performance of a next-generation GPU (Nvidia’s Pascal) for the same benchmark applications.