sis 16(9): e3

Research Article

An Inverse Problem Approach for Content Popularity Estimation

Download1334 downloads
  • @ARTICLE{10.4108/eai.14-12-2015.2262621,
        author={Felipe Olmos and Bruno Kauffmann},
        title={An Inverse Problem Approach for Content Popularity Estimation},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={3},
        number={9},
        publisher={ACM},
        journal_a={SIS},
        year={2016},
        month={1},
        keywords={popularity distribution, mixture model, maximum likelihood estimation, performance models, caching},
        doi={10.4108/eai.14-12-2015.2262621}
    }
    
  • Felipe Olmos
    Bruno Kauffmann
    Year: 2016
    An Inverse Problem Approach for Content Popularity Estimation
    SIS
    EAI
    DOI: 10.4108/eai.14-12-2015.2262621
Felipe Olmos1,*, Bruno Kauffmann2
  • 1: Orange Labs / CMAP École Polytechnique
  • 2: Orange Labs
*Contact email: felipe@olmos.cl

Abstract

The Internet increasingly focuses on content, as exemplified by the now popular Information Centric Networking paradigm. This means, in particular, that estimating content popularities becomes essential to manage and distribute content pieces efficiently. In this paper, we show how to properly estimate content popularities from a traffic trace. Specifically, we consider the problem of the popularity inference in order to tune content-level performance models, e.g. caching models. In this context, special care must be taken due to the fact that an observer measures only the flow of requests, which differs from the model parameters, though both quantities are related by the model assumptions. Current studies, however, ignore this difference and use the observed data as model parameters. In this paper, we highlight the inverse problem that consists in determining parameters so that the flow of requests is properly predicted by the model. We then show how such an inverse problem can be solved using Maximum Likelihood Estimation. Based on two large traces from the Orange network and two synthetic datasets, we eventually quantify the importance of this inversion step for the performance evaluation accuracy.