11th EAI International Conference on Performance Evaluation Methodologies and Tools

Research Article

Scalable analytical model of the reliability of multi-core systems-on-chip by interacting Markovian agents

  • @INPROCEEDINGS{10.4108/eai.5-12-2017.2274471,
        author={Andrea  Bobbio and Cristiana  Bolchini and Davide  Cerotti and Marco  Gribaudo and Antonio  Miele},
        title={Scalable analytical model of the reliability of multi-core systems-on-chip by interacting Markovian agents},
        proceedings={11th EAI International Conference on Performance Evaluation Methodologies and Tools},
        publisher={ACM},
        proceedings_a={VALUETOOLS},
        year={2018},
        month={8},
        keywords={markovian agents reliability multicore cpu embedded systems},
        doi={10.4108/eai.5-12-2017.2274471}
    }
    
  • Andrea Bobbio
    Cristiana Bolchini
    Davide Cerotti
    Marco Gribaudo
    Antonio Miele
    Year: 2018
    Scalable analytical model of the reliability of multi-core systems-on-chip by interacting Markovian agents
    VALUETOOLS
    ACM
    DOI: 10.4108/eai.5-12-2017.2274471
Andrea Bobbio1, Cristiana Bolchini2, Davide Cerotti1, Marco Gribaudo2,*, Antonio Miele2
  • 1: Università Piemonte Orientale
  • 2: Politecnico di Milano
*Contact email: marco.gribaudo@polimi.it

Abstract

The reliability of multicore systems-on-chip has been the object of several studies in recent years since these devices are heavily utilized in modern digital equipment at any level of complexity. This level of integration has caused a reduced time to failure due to the rapid scaling down of the dimension with consequent increase of the cores temperature and current densities. Past studies have utilized discrete event simulation: a technique very difficult to master in this scenario due to the number of components and the rarity of the failure events. The present study proposes an analytical framework based on Markovian Agent Models (MAM), able to capture systems with the big number of cores possible with the today and tomorrow's technology, while at the same time considering the effects caused by the position of the cores (center, border,corner) on the temperature level, and the dynamic redistribution of the workload with the progressive failure of the cores. The paper presents the model, adopting realistic parameters and interaction phenomena taken from the literature.