An Inverse Problem Approach for Content Popularity Estimation

The Internet increasingly focuses on content, as exemplified by the now popular Information Centric Networking paradigm. This means, in particular, that estimating content popularities becomes essential to manage and distribute content pieces efficiently. In this paper, we show how to properly estimate content popularities from a traffic trace. Specifically, we consider the problem of the popularity inference in order to tune content-level performance models, e.g. caching models. In this context, special care must be brought on the fact that an observer measures only the flow of requests, which differs from the model parameters, though both quantities are related by the model assumptions. Current studies, however, ignore this difference and use the observed data as model parameters. In this paper, we highlight the inverse problem that consists in determining parameters so that the flow of requests is properly predicted by the model. We then show how such an inverse problem can be solved using Maximum Likelihood Estimation. Based on two large traces from the Orange network and two synthetic datasets, we eventually quantify the importance of this inversion step for the performance evaluation accuracy.


INTRODUCTION
"Content is king", says nowadays a popular Internet meme. This advent of ubiquitous content is reflected on the Internet, both by the importance of Content Distribution Networks (CDNs) and transparent caching for coping with an ever-increasing traffic demand, and by the emergence of the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.
Information Centric Networking (ICN) paradigm. Understanding content and, in particular, its popularity is now essential to improve the Internet and its applications. Contentlevel performance models are therefore a key tool in the analysis, design and dimensioning of networks.
Sparse models are particularly useful, since they capture the salient features of the system while remaining simple enough for analysis, depending only on a few parameters. These parameters have a large impact on the model output; yet they often cannot be observed directly in measurements. Carrying a sensible analysis using the chosen model therefore requires solving the inverse problem to find the best model parameters of the system from the measurements.
Due to the rise of content, the number of available documents and their popularity distribution are now key parameters for traffic models. They have attracted significant attention from the community in the context of user generated content [3,9], HTTP traffic [10,13], and peer-to-peer networks [4,19]. However, the measurement methods used in these works are not suited for parameterizing a performance model. In fact, they fail to take into account that the request count for a given document in a given observation period, within the framework of a stochastic model, is not a fixed value, but a random variable. In particular, they ignore the fact that, in traffic traces, objects with no request are not observed, being thus a zero-censored sample.
Our main objective in this paper is to provide a sound methodology for popularity estimation, with the aim of correctly fitting performance models. This requires to take into account the stochastic relation between the model parameters and the request counts that are observed in a given dataset. To this aim, we follow [4] in constructing Maximum Likelihood estimates. We illustrate the aforementioned issues and methodologies in the case of Poisson based traffic models in the context of caching performance. Nonetheless, the essential paradigm that we propose is applicable to other traffic models and contexts. Note that the choice of relevant performance models is outside the scope of this paper.
The rest of this paper is organized as follows. We first review the literature in Section 2, and describe in Section 3 the datasets we use. We then explicitly identify and formulate in Section 4 the inverse problem that consists in correctly calibrating performance models from trace measurements. To our knowledge, such a formulation has not been provided in previous studies. In Section 5, we propose a ML estimation method for this inverse problem. Section 6 provides a numerical evaluation of our approach. Our results and possible extensions are discussed in Section 7.

RELATED WORK
The related work we here review falls into two broad categories: content popularity estimation from measurements and statistical methods.
Due to the fact that popularity distributions usually exhibit a power law behavior, a common method to estimate them is to fit its rank-frequency distribution in double logarithmic scale. This approach has been recently criticized by Clauset et al. [4]. The main issue is that the rank-frequency plot is not a reliable statistic since, for example, it can exhibit power-law behavior even if the ground-truth does not.
Despite these problems, the use of the latter method is still pervasive in performance evaluation [8,11] and traffic characterization studies [12,13,2]. Authors try to improve these methods by means of various adjustments. In [13], for example, authors separate in three parts the rank-frequency plot adjusting different curves in each piece and in [12], authors adjust "stretched exponential" curves instead of power-laws.
The latter adjustments indeed solve some of the fitting issues. In previous studies [11,17], we have noted another issue in the context of performance models, which arises from the fact that it is permitted to objects to have zero request. In consequence, from the point of view of the network operator, objects with no request are not observed in traces. In the statistical jargon, this is called zero-censored and not taking this fact into account leads one to underestimate the catalog size, which has an impact on the conclusions drawn from the fitted model (see Section 6).
In the present work, we address the previous issues by using Maximum Likelihood (ML) estimates. This method allows us to seamlessly handle the zero-censored case and it is proposed by Clauset et al. [4] as a robust method to fit heavy tailed data, which is a common property in popularity distributions. Maximum likelihood methods have already been in use for flow size estimation [16] and call center modeling [18]. The latter work uses an approach similar to ours, but it is limited to a specific parametric model for noncensored data. More importantly, our work highlights the fact that the assumptions of the performance model must be taken into account for a proper popularity estimation.
The statistical basis of our methods is the estimation of mixed discrete distributions, a subject that has been extensively studied in the literature. The non-parametric case has been addressed from two points of view: the first one searches the mixing density in the space generated by Laguerre polynomials with an exponential cut-off; the estimator is then obtained by a projection on the latter space [20,5]. It, however, converges slowly with the sample size unless the density belongs to the aforementioned space. We therefore base our methodology on the second point of view, which assumes the mixing distribution to be a sum of Dirac masses. The estimation methods are then similar to an Expectation-Maximization scheme (EM) [15]. As regards the parametric case, EM schemes for finding the parameters of the mixing distribution are provided for many families in [14]. In both parametric and non-parametric cases, the estimation algorithms do not handle the case of censored data, and thus we simply use an all-purpose nonlinear optimization solver to obtain our results.

DATASETS
We base our analysis on two real-traffic datasets, called #yt and #vod respectively. Dataset #yt comes from the YouTube traffic delivered for three months in 2013 by the Orange Network in Tunisia, while #vod comes from the Videoon-Demand Orange service in France for 3.5 years. The traffic consists in 46 000 000 (resp. 3 400 000) requests to 6 300 000 (resp. 120 000) videos in the #yt (resp. #vod) set. More details on the collection and processing of these two datasets can be found in [17].
We also use two synthetic datasets, called #prt and #delta. This allows us to highlight in a more clear way some of our findings and, more importantly, to validate the results with controlled experiments when the ground-truth is not available. The set #prt (resp. #delta) is generated by first drawing 10 000 000 (resp. 100 000) random samples with distribution Pareto (1.6, 0.1) (resp. Dirac delta at 4.0) representing the popularity (see section 5.1 for a model description). The number of requests for each document is then drawn according to the Poisson distribution with mean equal to the document popularity. After discarding the documents with zero request, this results into 2 600 000 (resp. 400 000) requests to 1 900 000 (resp. 98 000) documents.

PROBLEM DEFINITION
In the following, we are given a stochastic object-level model predicting some performance indicator. The predicted performance explicitly depends on a few parameters which characterize each object (e.g., document popularities, lifespans, sizes). It also strongly depends, however, on some implicit assumptions about the traffic or request process.
An example of such a situation is the evaluation of the hit ratio of a Least Recently Used (LRU) Cache, which is typically performed using the Independent Reference Model (IRM). In this context, users request documents among a catalog of K documents. These requests are intercepted by a cache server, which can store and serve only an evolving subset of the catalog. The IRM assumes that the sequence of requests for document 1 ≤ k ≤ K is a Poisson process with intensity λ k , where λ k is proportional to the popularity of document k; all such processes are mutually independent and their superposition build up the total request process. In this model, the number N k of requests for document k in a time window W is an independent Poisson random variable P(λ k W ) of mean λ k W . Up to a time normalization, we assume in the following that W = 1. Figure 1 illustrates those two stages, both for an arbitrary performance model and the IRM case. The first stage consists in mapping the model parameters to a request flow (or a request flow distribution). The second step of the model computes the performance indicator, based on this request flow. In order to keep this paper concise, we now limit ourselves to the IRM model (see Section 7.1 for extensions).
Assume now that an observer has access to a sample of the actual request flow, e.g., a trace dataset or server logs. In the case of IRM, a sufficient statistics of the request process is the request counts n1, n2, . . . , nK 0 for all observed document, where K0 is the number of observed documents in the sample. Following the point of view of an Internet Service Provider (ISP), we here assume that objects with zero request are not observable in the sample. Our main objective is to solve the following inverse problem: estimate the popularity distribution such that the request flow predicted by the model using these parameters represents the data at best. A simple way to do this, henceforth called the naive method, consists in estimating the popularity of a document by its request count and the catalog size by the number of observed objects, that is:K nv = K0 andλ nv k = n k , for 1 ≤ k ≤K nv . Two problems can be identified at this stage. First, since the trace is zero-censored, with high probability the observed number of documents K0 is strictly smaller than the catalog size K. Second, each document popularity λ k is estimated by a single sample n k of the random count N k . This last limitation is well illustrated in the case of the #delta dataset. By definition, the ground-truth (real) popularities are λ k = 4. In the dataset, however, the counts of document requests are Poisson random variables of mean 4, hencê λ nv k = P(4) and the naive estimation "dilutes" the mass of popularities over the set of positive integers. In Figure 2 Figure 2: Hit ratio of a cache fed by #prt trace: groundtruth (GT) and prediction by the naive estimation. The cache size is normalized with respect to that of the GT show the impact of these limitations for the hit ratio estimation, based on the #prt trace. The first curve is our groundtruth. It is obtained via simulation of a LRU cache starting from an empty cache; the cache is fed by the traffic trace that is randomly shuffled to enforce the IRM assumption. The second curve is the prediction of the IRM model, when fed by the real popularities in the trace (see Section 8.1 for a quick derivation of the transient hit ratio for the IRM). As expected, it perfectly fits the ground-truth. The third curve shows the results obtained by the IRM model when fed by the parametersK nv andλ nv k , 1 ≤ k ≤K nv , from the naive estimation. The hit ratio curves are seen to clearly differ, and the naive method proves inaccurate for estimating document popularities when fitting a performance model.
In the absence of any prior knowledge about the popularity distribution, the only available data for the estimation of each document popularity is a single request count, which limits the accuracy of this approach. To overcome this lack of information, we thus aim at jointly estimating the set of popularities, from the joint set of request counts. The latter approach allows us to use all the information contained in the joint Poisson distribution rather than just the mean.

MAXIMUM LIKELIHOOD ESTIMATION
In this section, we show how to solve the latter inverse problem via the Maximum Likelihood method.
In the IRM setting, the parameters (λ1, λ2, . . . , λK , K) are not ordered, and thus every request count could correspond to any of the popularities. The likelihood given observations n1, n2, . . . , nK thus runs through every permutation σ of size K. Specifically the likelihood is given by This combinatorial explosion for large K makes the ML method intractable for the IRM model. We thus propose in the following a slightly modified model, which is simultaneously tractable for ML estimations and simple to analyze.

IRM Mixture Model (IRM-M)
In order to succinctly describe the popularity parameters λ1, λ2, . . . , λK and to ease their estimation, we slightly modify the IRM model by considering them as random variables. Specifically, we assume that λ1, λ2, . . . , λK are an i.i.d. sample from an unknown mixing distribution with density g. Given the value of λ k , the request process to the k th document remains a Poisson process of intensity λ k , and thus the counts of each document follow a mixed Poisson distribution with mixing distribution g. In particular, the number of requests N for any document satisfies for j ∈ N, where the operator Eg[·] represents the expectation under the mixing distribution g.

ML estimation on IRM-M
By modifying the model, we have changed the problem of estimating the static parameters λ1, λ2, . . . , λK , to that of estimating the mixing distribution g.
We remark that in this setting, the catalog size K is decoupled from the popularity distribution. Thus, we can first obtain an estimatorĝ of the mixing distribution g, and then approximate K byK which is asymptotically close to the ML estimator. We now proceed with the detailed form of the likelihood function for the parametric and non-parametric estimation procedures. In both approaches, we numerically solve the problems with a generic non-linear optimization solver in MATLAB based on an interior point algorithm. Our code is freely available online. 1 We discuss the use of specialized algorithms in Section 7.

Parametric Estimation
In this setting, we determine the mixing distribution within a parametric family of density functions. The choice of that parametric family relies on an a-priori knowledge. The computation of the ML estimator obviously depends on this choice, and due to space restriction, we here limit ourselves to the two-parameter Pareto family with densities g(x) = αx α m /x α+1 for x > xm, with α, xm the shape and scale parameters, respectively. The log-likelihood of parameters α and xm then reads xm)) .

Non-Parametric Family
In the absence of a-priori knowledge about the distribution g, the non-parametric (NP) approach provides a method to obtain an estimator. In this setting, we determine a discrete distribution g of the form P[λ = xi] = θi for 1 < i < I. The 1 Code : http://www.olmos.cl/code/mixed_poisson.tgz log-likelihood correspondingly reads

Hit Ratio Analysis
As detailed in the Appendix 8, the IRM-M model proves to be tractable for evaluating the performance of an LRU cache. In particular, the so-called "Che approximation" is easily adapted to the IRM-M case; furthermore, we are able to derive formulas for the transient analysis of the hit ratio, when starting from an empty cache.

NUMERICAL EVALUATION
The accuracy of the parameter estimation can be evaluated at three different levels, as expressed by the following questions: (1) Is the estimated popularity density close to the actual popularity density? (2) Is the request flow predicted by the model statistically similar to the actual request flow? (3) Is the performance indicator of the fitted model, e.g., the hit ratio, accurately predicted?
Throughout this section, we assess the precision of a curve estimate by computing the so-called mean absolute relative error (MARE). More precisely, the MARE between a reference sequence of points (or curve) (xi) 1≤i≤N and an estimate sequence (yi) 1≤i≤N is defined by |yi − xi| |xi| .

Estimation of popularity distribution
First, we start with the most general question, that is, the estimation of the mixing distribution. Such an inverse problem is known to be ill-posed.
For the NP estimation, we obtain an estimateĝ np of the popularity density by applying the NP method, using a support with 0.01 as lower bound, exponentially increasing spacings and an upper bound slightly larger than the maximum of observed requests (e.g., 2 400 for #prt and 16 for #delta). The naive fitting corresponds to the empirical measure of the request counts, that is, the mixture of Dirac measures 1 K 0 K 0 k=1 δn k (.). We observe in Figure 3 the NP estimator of the mixing distribution for the #delta and #prt datasets. In the #delta case, the ground-truth is a Dirac measure at λ = 4, and the naive method fails at correctly estimating its shape, whereas the ML estimator concentrates its mass around the value λ = 4. In the #prt case, the estimated distribution is irregular, tending to accumulate mass at certain points (see Section 7.2 for possible regularization solutions). This concentration is no surprise, since in the non-censored case the ML estimator is discrete probability distribution [15]. The peaks, nevertheless, capture the power law trend, as reflected by the good estimation quality of the mixture distribution. In contrast, the naive method fails at correctly estimating both the trend of distribution body and its tail. Using Equation (2), we also calculate the catalog size, givingK ≈ 11 600 000 (resp. 105 278) for the #prt (resp. #delta) case. This represents a relative error of 11.6% and 5.2%, respectively. Following Equation (2), it shows that estimating the probability that a document receives no request for the duration of the trace, based on the very same trace, is a difficult task. As a consequence, this error is not negligible. It is, however, smaller, and even more significantly in the #prt case, than the relative error of the naive method (recall thatK nv = K0 = 1 900 000 andK nv = 92 046 for the #prt and #delta traces, respectively).
When some a priori knowledge about the distribution shape is available, the estimates can be improved via the parametric approach. In the #prt case, the resulting Pareto fit gives the estimatesα = 1.597 andxm = 0.099 that are very close to the original parameters α = 1.6 and xm = 0.1. We compare these results to that of the "log-log" approach, which consists in estimating the tail index by fitting a least square approximation to the log-log rank-frequency plot, as shown in Figure 4. The rank frequency plot roughly decays as 1/α. Using the first 20 000 objects to compute the regression, the estimation gives 1.704, which is worse than the ML estimate.

Request flow estimation
In this section, we specify the discussion by estimating the zero-censored request count distribution (or mixture distribution in statistical terms) P[N = j | N > 0], j ≥ 1.
For the naive approach, we generate 50 000 IRM traces using the estimated parameters. We then calculate the average empirical distribution of the request per document. The number of generated traces ensures a coefficient of variation lower that 10 −4 for all points of the distribution. As regards the ML approach, using theĝ np density, we compute the associated zero-censored request distribution using (1).
In Figure 5, we show the resulting zero-censored request distribution estimated by each method. For comparison, we include the real mixture distribution for the #prt dataset, which can be calculated explicitly. For the #yt and #vod datasets, we show instead the observed request distribution.
Two issues are raised by the naive approach, that are not present in the maximum likelihood estimation: -first, at the head of the distribution, where most of the mass is concentrated, large estimation errors are produced by the naive approach. Such errors produce a mass shift towards the tail of the distribution. On the contrary, the NP estimation matches perfectly the head of the distribution; -second, the naive method over-fits the tail of the distribution. We observe in Figure 5d that the naive estimate shows a "horizontal branch" at the tail, and differs significantly from the ground-truth that is approximately a straight "diagonal" line. This horizontal branch is in fact a few isolated masses, though they look as a line on the figure. The naive estimation therefore concentrates the mass of the groundtruth distribution on a few points. On the other side, the ML estimation correctly estimates the trend of the distribution at all scales, though noise inaccuracies appear at the tail. This is quantified by the MARE of 1.67 for the ML estimation, whereas the naive method leads to a MARE of 668, for the full range distribution. As regards the #yt and #vod cases in Figures 5e and 5f, we similarly observe the same horizontal branch at the tail for the naive distribution. In the absence of available ground-truth, we do not compute the MARE, but the similarity of behavior hints that the ML method also performs better on these traces.

Hit Ratio Estimation
We finally compare the hit ratios predicted by the IRM-M model with popularity distributions fitted using the naive and the ML methods, both for the #prt and #yt traces. Figure 6 shows the obtained hit ratio curve in each case. The ground-truth curves are obtained by simulation of a LRU cache fed by the shuffled traces. The Naive (resp. NP) curves are obtained when using Formula (6) (resp. (9)) with the parameters obtained by the naive (resp. NP) method. Finally, the Zipf curve, for the #prt trace, corresponds to the hit ratio prediction when using the "log-log" parametric fitting method detailed in Section 6.1.
The naive approach leads to small inaccuracy for the #yt trace and large errors for the #prt trace, with respective MARE of 0.06 and 1.44. This difference in estimation accuracy can be explained by the variability of the random variable N . Indeed, in the #yt dataset, documents receive an average of 7.3 requests per document, whereas this average decreases to 1.4 in the #prt trace. It follows that the coefficient of variation of the request count distribution is greater in the #prt trace than in the #yt trace. As expected, the inaccuracy of the naive method is greater for the former than for the latter. Note also that from an operational point of view, the focus is on the miss ratio, which determines the dimensioning requirements upstream of the cache. The inaccuracy of the naive hit ratio prediction for the #yt dataset becomes relatively significant in this context. As shown by the Zipf curve, the knowledge of a relevant parametric family allows us to improve the hit-ratio estimation. The error, however, remains significant with a MARE of 0.96. In contrast, the non-parametric ML curves match perfectly the original ones, as shown by the MARE of 0.002 for the #yt trace and 0.005 for the #prt trace. We conclude that, as regards hit ratio, our estimation method accurately estimates the model parameters. In contrast, in the Zipf case, a seemingly small error of 0.1 in the estimation of the tail exponent  leads to a significant error in the hit ratio estimation.

Extendability to other models
Our method relies exclusively on the fact that we explicitly know the request count distribution of a document, given its popularity. As a consequence, our framework can be extended to other traffic models such as renewal traffic [6,1] and the Shot-Noise (SN) traffic [17,21], though the formulation of the ML function changes in each case.
In practice, however, this reformulation introduces new challenges to the inverse problem. First, note that in the IRM-M case, the requests processes are particularly tractable for the inverse problem, because they are entirely characterized by a single parameter and exhibit no correlation. However, for other processes, the fitting of the point processes may require to fit a multivariate distribution [17] or a whole random distribution [6] per document. Second, due to the greater importance of the time variable in other models, the censoring effects coming from the finite observation window are more severe. Additional care must be taken when adapting our estimation method to other models.

Maximization techniques
The main current limitation of our maximization approach is that the estimated mixing density exhibits a lot of peaks, which is consistent with the results of Lindsay [15]. This might be a problem when one aims at understanding the nature of the popularity distribution.
A possible solution to enforce smoothness in the mixing density estimation is to introduce a penalization for the irregularities. Classical candidates for penalization are the L 2 -penalization or a logarithmic penalization R(θ) = I i=1 (θi+1 − θi)(log θi+1 − log θi)/(xi+1 − xi). One then maximizes ℓ (θ; µ) − ρR(θ), where ρ represents the tradeoff factor between fitness and smoothness. Regularization here comes at the price of choosing the right penalization function R(·) and the right value of ρ and in our case, the results have been satisfactory only for concentrated mixing distributions.
Another possibility is to exploit the fact that the peaks conserve the overall trend of the distribution. We thus extract the peak locations. A second ML optimization is then performed using these peak locations as the new support. Though non-standard, this gives satisfactory results for the #prt dataset (not shown here due to lack of space).

Summary of results
In this paper, we have presented and solved the inverse problem that consists in estimating from a trace the popularity parameters to be used in a performance model. A key point in our approach is that we consider the probability that a document receives a given number of requests, rather than the probability that a request is directed to a given document. This representation is consistent with recently developed caching models [17,21,6]. Moreover, it allows us to avoid the fitting of a rank-frequency plot, which is in essence an order statistics and may exhibit undesirable properties. Our second contribution on the modeling aspects is that we consider popularities as random variables, rather than parameters, leading to a tractable mixture model.
The inverse problem stems from the random nature of the requests count N for a given document. In particular, a traffic trace contains a single sample of these requests counts. The accuracy of any method that aims at fitting independently the popularity of each document is therefore limited by the inherent variability of the random variable N . The importance of using a sound methodology correspondingly increases when the variability of the request counts is large, which is typically the case when N is small.
Determining the parameters of the model allows one to use the performance for several objectives, including the dimensioning of operational networks or the design of new mechanisms. More importantly, in contrast with simulation-based analysis, it enables one to more easily explore "what-if" scenarios, by keeping some parameters at their current value and modifying others to reflect future or possible changes.

APPENDIX
We here detail the derivation of hit ratio formulas for the IRM-M model.

IRM Model
For comprehension purposes, we first briefly review the "Che approximation" method for the hit ratio estimation in the IRM model (additional details can be found in [7]). Given popularities λ1, λ2, . . . , λK, let X k (t) denote the number of different documents, apart from the k-th, requested in a time window [0, t], that is, Let T k C = inf{t > 0 : X k (t) ≥ C} be the exit time to level C for process X k ; T k C represents the eviction time for content k in a LRU cache of size C, given that it is not requested during this time period. Now, the core of the "Che approximation" consists in the two following steps: 1. all T k C have the same distribution, i.e., ∀k, T k C d = TC; 2. the random variable TC is well approximated by a constant tC called the "characteristic time". The time tC is implicitly defined by the equation Intuitively, tC is the time when, on average, C different objects have been requested.
In the stationary case, the hit ratio HR can then be derived as follows. Using the PASTA property, the hit ratio of document k for a cache of size C is equal to 1 − e λ k t C , and by averaging on all documents, it follows that In the transient case, we simply assume that T k C ≤ W (the hit ratio does not increase with T k C when T k C > W ). By independence, it can be shown (see Proposition 3, [17]) that the average number of hits for the k-th document in a time window of size W , starting from an empty cache, is E h(λ k , T k C ) where the expectation carries on T k C and the function h(λ, t) is defined by h(λ, t) = (λW − 1)(1 − e −λt ) + λte −λt , t < W.
In consequence, setting Λ = K k=1 λ k , the transient hit ratio HR(W ) is given by Applying the "Che approximation", we then obtain (6) The second term of (6) vanishes as W → ∞, leading to equality (4) for the stationary hit ratio.

IRM-M Model
We now address the IRM-M case. We first show how to derive the hit ratio in this setting; we further prove formally the validity of the "Che approximation" in the case where C = δK and K tends to infinity.
• Given the popularities λ1, λ2, . . . , λK, let us define X k , T k C as in the previous section, and let δ = C/K be the proportion of stored documents. As the popularities are here an i.i.d. sample, and since X k and T k C are independent of λ k , the previous quantities do not consequently depend on the document index k. In consequence, this validates the first step of the "Che approximation".
For the second step, define the characteristic time t δ as which is equivalent to dividing both sides of (3) by K. Following the same steps as in the previous section, it is easy to derive the following hit ratio formulas: Equations (8) and (9) are the IRM-M analogs of (4) and (6).
• We show that the second step of the Che approximation is asymptotically exact, that is, the random variable TC can be replaced by the associated characteristic time t δ . Consider the case where the cache size scales with the catalog size, that is, δ remains constant, and C and K grow to infinity. Recall that the distribution of TC is given by for t ≥ 0, which can be rewritten as An application of the law of large numbers shows that applying then the bounded convergence theorem (Section 13.6, [22]) to the latter identity and dividing by the expected number of requests E[λ] leads to formulas (8) and (9), as claimed.