Approximate performance analysis of generalized join the shortest queue routing

In this paper we propose a highly accurate approximate performance analysis of a heterogeneous server system with a processor sharing service discipline and a general job-size distribution under a generalized join the shortest queue (GJSQ) routing protocol. The GJSQ routing protocol is a natural extension of the well-known join the shortest queue routing policy that takes into account the non-identical service rates in addition to the number of jobs at each server. The performance metrics that are of interest here are the equilibrium distribution and the mean and standard deviation of the number of jobs at each server. We show that the latter metrics are near-insensitive to the job-size distribution using simulation experiments. By applying a single queue approximation we model each server as a single server queue with a state-dependent arrival process, independent of other servers in the system, and derive the distribution of the number of jobs at the server. These state-dependent arrival rates are intended to capture the inherent correlation between servers in the original system and behave in a rather atypical way.


Motivation
This work is motivated by web server farms. Server farms have gained popularity for providing scalable and reliable computing and web services. Most commonly the objective in analyzing such a system lies in the determination of an optimal or near-optimal load balancing routing protocol so as to maximize the performance of the system, see, e.g., [1,6,8], where the performance of interest is usually the mean response time for an arbitrary job. In this paper the objective is to report some interesting properties of the arrival flow to each server and suggest an approximation approach for the GJSQ routing protocol. We consider farms with heterogeneous servers, which is motivated by the different hardware and the wide variety of computing capacities regarding processing power and memory access performance seen in practice in server farms [13]. We assume that service requests arrive to the system according to a Poisson process. Upon arrival, a front-end dispatcher routes the request to one of the servers. After the request has been routed to the server, we assume that it cannot balk or jokey. All requests routed to a server are sharing the provided service (think of bandwidth, CPU, or RAM). We assume a PS service discipline at each server since it closely approximates the scheduling policies [7,10] employed by most commodity operating systems (e.g., Linux CPU time-sharing) and is a popular policy in computing centers (e.g., Cisco Local Director, IBM Network Dispatcher and Microsoft Sharepoint, see [3] for a survey).
In [5] the authors consider a server farm consisting of homogeneous servers, where upon arrival jobs are routed according to the JSQ routing protocol. This protocol in case of homogeneous servers, due to the PS service discipline, is performing near-optimal in terms of the mean response time. However, as indicated by Whitt in [15], the JSQ policy is far from optimal in case of heterogeneous servers. In [2] the authors comment on the performance of various systems under different routing protocols and conclude that the shortest expected delay (SED) routing protocol is near-optimal in terms of mean response time. The SED policy is a policy that routes jobs upon arrival to the queue promising the minimum expected delay (which also includes the processing time). In case of exponential job-size distributions, the GJSQ and SED routing protocols are identical and in case of homogeneous servers GJSQ and JSQ are the same. However, in case of general job-size distributions and heterogeneous servers we assume that the only available information are the service rates and the number of jobs at each server, i.e. we do not keep track of residual processing times.
Due to the complexity and the various challenges that the model at hand presents, we restrict our analysis to the case of two heterogeneous servers with a general job-size distribution under the GJSQ routing protocol. From here onwards we refer to this model as the M/G(1, s)/2/GJSQ/P S system, abbreviated as the GJSQ model, where G is the job-size distribution and 1 and s are the service rates at servers 1 and 2, respectively. The approach described in this paper can be seen as a first stepping stone towards the analysis of heterogeneous server farms with PS servers; a very broad area, full of interesting problems. Moreover, the ideas presented here extend the work of Gupta et al. [5] on the analysis of the JSQ routing for homogeneous web server farms.

Related work
To the best of our knowledge there is no previous mathematical analysis of the GJSQ system. In [14], Selen et al. derive the joint equilibrium distribution of the number of jobs at each server in the M/M (1, s)/2/SED/F CF S model. They prove that this distribution can be expressed as an infinite series of product forms using the compensation approach. The benefit of that approach is that it produces, by truncating the series expression, numerical results with an a priori set accuracy level. Unfortunately, the compensation approach is not appropriate (in its current setting) for multiple servers, nor for general job-size distributions. Before [14], very little was known regarding the mathematical analysis of the SED policy. In [11], the authors suggest two models that act as upper and lower bounds to the SED system. However, they do not provide closed form expressions for the equilibrium distribution of these two bounding models, but only an algorithmic approach based on matrix analytic methods. Furthermore, in [4,9], the authors show that the SED routing policy is asymptotically optimal in terms of the mean response time and results in complete resource pooling in the heavy traffic limit. This heavy traffic limit result may be used in a similar manner as in [12]. However, after a few numerical experiments, we concluded that this approximation in our case results in poor estimates and for this reason we did not proceed in this direction. On the contrary, the approach developed by Gupta et al. [5] on approximating the distribution of the number of jobs at each server, as we show in this paper, is appropriate for the GJSQ setting with heterogeneous servers. More concretely, in [5], the authors develop the SQA method that accurately determines the distribution of the number of jobs at each server by modeling each queue as an M n /M/1/P S system with state-dependent arrival rates. These state-dependent arrival rates are referred to as the conditional arrival rates and are constructed in such a way that they capture the inherent correlated behavior of the complete server farm.

Contributions
We believe that we provide the first approximate analysis of the equilibrium distribution and moments of the number of jobs at each server in the GJSQ system (and by Little's law also the mean response time for an arbitrary job). Moreover, the approximation is highly accurate: we encounter a maximum relative difference between the approximation and simulations of 2.2%. In deriving these approximations, we provide three key contributions: 1. The mean and standard deviation of the number of jobs at each server and the conditional arrival rates are near-insensitive to the job-size distribution. This allows us to study the more tractable model with an exponential job-size distribution.
2. In case of an exponential job-size distribution, the SQA method yields the same equilibrium distribution for the number of jobs at each server as in the original GJSQ model.
3. For the application of the SQA method we present an approach for the derivation of the conditional arrival rates. In particular, we show that the conditional arrival rates, say λ i (n), i = 1, 2, n ∈ N 0 , to server 1 satisfy where ρ is the load on the system, see Section 2, and the conditional rates to server 2 for large n oscillate between s different points. Note that the former result is similar to the result obtained in [5] for the case s = 1, however the latter result is very atypical and is discussed in greater detail in Section 2.3.

Outline
The rest of the paper is organized as follows. In Section 2 we give a detailed model description and formally define and investigate the time-average and conditional arrival rates. Section 3 is devoted to showing that the performance metrics of interest are near-insensitive to the job-size distribution. We describe the SQA and determine the conditional arrival rates in Section 4. The approximations are evaluated in Section 5. In Section 6 we present some conclusions.
2 Model description

Heterogeneous servers
We consider a system of two heterogeneous servers and a single dispatcher. The servers employ a PS service discipline and can have different service rates, i.e. server 1 has service rate 1 and server 2 has service rate s. Jobs arrive to the dispatcher according to a Poisson process with rate λ and are routed immediately to one of the servers. Jobs cannot switch servers after being routed. We detail the routing policy in Section 2.2. The size of a job is drawn from a general distribution G. Without loss of generality we assume that the mean job size is 1. Note that, for example, the (residual) processing time of a (residual) size G job that runs on server 2 that is currently serving q 2 jobs is given by Gq 2 /s. In what follows we assume that s is a positive integer number. In the general case s ∈ R + we can bound the corresponding system by two systems with service rates given by the closest two integers to s.

Routing policy
The routing policy employed by the dispatcher is a state-aware policy, i.e. the dispatcher is aware of the number of jobs at each server just before an arrival instant, q 1 and q 2 , and the service rates. The GJSQ routing policy routes an arriving job to the server with the smallest index (q i + 1)/s i , where s i is the service rate at server i. In case of a tie, the job is randomly routed to one of the servers. These indexes may be interpreted as an estimate of the expected processing time for the arriving job, made by the dispatcher who is unaware of the job-size distribution and the remaining processing times of the jobs currently in service, and furthermore ignores future arrivals.
Under this routing policy, we define the load on this system as Throughout the rest of this paper we assume that ρ < 1.
Although not necessarily optimal, GJSQ routing outperforms JSQ routing when servers are non-identical. GJSQ routing attempts to balance the load on the servers by taking into account the different service rates in addition to the information on the current number of jobs at each server. In Figure 1 we show that the long-term fraction of jobs routed to the two servers is a function of the load ρ. In light traffic GJSQ assigns all jobs to the fast server and in heavy traffic the load is divided proportionally according to the service rates. This is in contrast with JSQ routing, which assigns a long-term fraction of the jobs to server 1 that decreases from 1/2 to 1/(1 + s) for increasing load ρ (verified through simulation).

Arrival rates
We briefly introduce two important concepts related to the (time-average) arrival rates to each server. These concepts will be used throughout the paper.
Definition 2.1. In the GJSQ model, the time-average arrival rate to server i is defined as Definition 2.2. In the GJSQ model, the conditional arrival rate to server i, given that server i has n jobs, is defined as where A i,n (t) is the number of arrivals at server i during the time interval [0, t] that see n jobs at server i on arrival (excluding themselves), and T i,n (t) is the total time spent by server i with n jobs during the time interval [0, t].
The two definitions above are related. Assuming it exists, let π i (n) be the equilibrium probability that there are n jobs at server i, then λ i = ∞ n=0 λ i (n)π i (n). Figure 2 depicts the conditional arrival rates to server 2 for varying s. Intuitively it makes sense that if a server has many jobs, the other server will probably have few jobs and thus it is less likely that the dispatcher routes the job to that server. However, what we see here is a peculiar repeating pattern that has s different points and does not align with this intuition. We see that if server 2 has a multiple of s jobs (or one less), fewer jobs are routed to server 2. This pattern is difficult to explain, but it is definitely related to the probability that server 1 has a lower index than server 2, given that server 2 currently has n jobs. We expect and indeed verify that this probability also follows such a repeating pattern. Additionally, states in server 2 are somewhat similar if they differ by a multiple of s jobs, which can be derived from the equilibrium distribution in [14].

Near-insensitivity
In [5] the authors establish a near-sensitivity property in the setting of a homogeneous server farm with JSQ routing. In particular, the first and second moment of the number of jobs at server i, Q i , and the conditional arrival rates are near-insensitive to the job-size distribution. The near-insensitivity of these two metrics seems related to the insensitivity of the equilibrium distribution to the job-size distribution in PS servers, see, e.g., [5,Theorem 4.2]; and the fact that the routing policy only uses the number of jobs at each server when making a decision, as opposed to, e.g., using residual processing times. The GJSQ routing decisions are based on the dynamically changing number of jobs at each server as well as the service rates. Indeed, one expects the near-insensitivity properties to extend also to the case of heterogeneous servers and GJSQ routing. Establishing this near-insensitivity property is important, since it allows us to limit our attention to the more tractable GJSQ system with an exponential job-size distribution.
Log-normal (0, ∞) 10  Table 2: Simulated mean and standard deviation of Q i , for the GJSQ system with various s, ρ and job-size distributions. Sample standard deviation is shown in parentheses. Last two columns show the value obtained by the SQA and the relative difference with respect to the exponential case.

Simulation settings
To support our claims, we simulate the GJSQ model. A simulation consists of 2 · 10 6 job departures and each simulation is repeated 50 times. Statistics are only computed for departed jobs, i.e. data of jobs that are still in service at the end of the simulation are discarded. In Table 1 we list the four job-size distributions considered in this paper.

Near-insensitivity results
In Table 2 we show simulated statistics on the mean and standard deviation σ(·) of Q i for the GJSQ model with various job-size distributions. For the settings considered in Table 2, the mean number of jobs at server i deviates by no more than 3.6% from the exponential case, while the standard deviation deviates by at most 4.4%. The largest deviations from the exponential case occur for the log-normal job-size distribution. This is as expected, since this job-size distribution has a variance that is 10 times higher than the variance of the exponential job-size distribution. Although the results are not as strong as those shown in [5, Figure 3], we conclude that the more volatile environment of heterogeneous servers and GJSQ routing also has the near-insensitivity property for E[Q i ] and σ(Q i ). Moreover, the performance in terms of the mean response time is also near-insensitive to the job-size distributions by Little's law.
Concerning the conditional arrival rates, we see in Figure 3 that the simulated values for   [14] for the exponential job-size distribution.
the job-size distributions of Table 1 match the results of the algorithm for the exponential case [14]. Simulated values for states where the sample standard deviation is not too high differ by at most 5% from the results for the exponential case. So, also the conditional arrival rates are near-insensitive to the job-size distribution.

Single queue approximation
We have established near-insensitivity of E[Q i ], σ(Q i ) and the conditional arrival rates to the job-size distributions. Thus, we may limit our attention to systems with an exponential jobsize distribution. In this section we derive an approximation for the distribution of the number of jobs at each server using the SQA, which models server i as an M n /M i /1/P S queue with state-dependent arrival rates λ i (n), see also [5,Section 3]. SQA is exact when the job-size distribution is exponential and the routing belongs to a specific class of routing policies; the following theorem is a version of [5, Theorem 3.1] that is applicable to the GJSQ model.  /S queueing model, where R is any stationary state-aware routing policy, e.g. GJSQ, and S is any stationary, size-independent, workconserving service discipline, e.g. PS. Consider server i in this model. Then SQA with the exact conditional arrival rates λ i (·) yields the same equilibrium distribution for the number of jobs at each server as in the original model.
It remains to specify the conditional arrival rates λ i (n) for both servers. We combine exact limiting results for n ≥ N i and approximation results for n < N i , where N 1 = 3 and N 2 = 2s. These choices for N i result in accurate approximations.
We note that Theorem 4.2 implies that in order to determine the conditional arrival rates, we may assume a FCFS service discipline.
In Figure 2 we have seen that the conditional arrival rates λ i (n) exhibit a repeating pattern from some n and onwards. We rigorously characterize this limiting repeating pattern in the next theorem. where
Proof. See Appendix B.
For the rates λ 1 (n), n < 3 and λ 2 (n), n < 2s we provide approximations that are functions of s and ρ. For server 1 we use a multiple linear regression model to fit an approximate function for the conditional arrival rates on data obtained from the algorithm in [14] for s = 1, 2, 3, 4 and ρ from 0.3 to 0.99. Obviously, one can also use conditional arrival rates obtained by simulation for these fitting purposes. We carefully select a set of 5 independent variables for each conditional arrival rate. This leads to the following approximate conditional arrival rates for server 1: For server 2, let us note that λ 2 (n) = λ, n = 0, . . . , s − 2 due to the GJSQ routing. Using a multiple regression model in this case is more difficult, since the number of states for which we need to obtain a fit increases with s. To circumvent a possibly complex fitting procedure, we establish a relation between the conditional arrival rates for the states n = s − 1, s, . . . , 2s − 1 and the limiting conditional arrival rates determined in Theorem 2. Namely, where for convenience λ lim 2 (−1) = λ lim 2 (s−1). The approximations (4.4)-(4.7) behave in various limiting regimes as expected: 1. For s → ∞, we have that λ 1 (n) ↓ 0 and λ 2 (n) = λ for all n ∈ N 0 . No job will join server 1, since the processing times in server 2 are instantaneous.

Evaluating the approximation
We are now in a position to evaluate the proposed approximations. First, we show that the approximations for the conditional arrival rates follow closely the exact values, which were determined using the algorithm [14]. Second, we establish that the mean and standard deviation of the number of jobs at each server is also well approximated. Figure 4 compares the conditional arrival rates obtained from the algorithm in [14] and the approximations derived in the previous section. For the cases considered in the figure, the maximum relative difference of the approximation with respect to the values determined by the algorithm is 1.5% for λ 1 (·) and 4.1% for λ 2 (·). Since both methods consider exponential job-size distributions, the difference is due to the fitting error introduced in the approximations of the conditional arrival rates in Section 4 and the truncation error in the algorithm in [14]. However, since the truncation error has been chosen to be of the order 10 −5 , it has little influence.
In the two rightmost columns of Table 2 we provide the mean and standard deviation of the number of jobs at both servers determined using the SQA. We report highly accurate approximations that deviate less than 2.2% from the case with an exponential job-size distribution for  Figure 5: Simulated conditional arrival rates for a system with three servers with service rates 1 ( ), 2 ( ), and 5 ( ), with ρ = 0.7.
the listed values of s and ρ. Although our approximations are not aimed at the case s = 1, we report accurate approximations also in this setting with maximum relative differences of the same order as in [5, Section 6.1].

Conclusion
In this paper, we provide an approximate performance analysis of a queueing system consisting of two heterogeneous PS servers with service rates 1 and s ∈ N, respectively, a general jobsize distribution and GJSQ routing. More concretely, we derived the approximate equilibrium distribution of the number of jobs at each server using the SQA method. In order to apply SQA, we established that the GJSQ system is near-insensitive to the job-size distribution and thus we approximated the system at hand with exponentially distributed job-sizes. We then approximated the conditional arrival rates for the exponential case, by combining exact limiting results for large number of jobs and approximation results, which were obtained using a multiple linear regression model, for small number of jobs. Ultimately, the aforementioned approach resulted in approximations that are highly accurate; we reported a maximum relative difference with respect to exact or simulation results of 4.1% for the conditional arrival rates and 2.2% for the mean and standard deviation of the number of jobs at each server. In this paper we set the groundwork for the analysis of server farms with heterogeneous servers under the GJSQ routing policy by analyzing the case of two servers. Of course, server farms consist of multiple servers so it is in our future plans to extend the analysis presented in this paper to more than two servers. The most difficult aspect of this task would be the derivation of the conditional arrival rates, which possibly has to rely on simulation data, since the approach in [14] is in its current setting restricted to two servers. In Figure 5 we present an example of the simulated conditional arrival rates in case of three servers with service rates 1, 2 and 5. Note that the structure of the conditional arrival rates is as expected, i.e. the number of points in the repeating pattern is directly related to the rate of service, but the exact values of these points differ from the values obtained by formulas (4.1) and (4.2).

A Definition of the variables and functions in Theorem 4.3
The definitions can be found in [14,Lemma 5.11], but we summarize them here. We denote α = ρ 1+s .

B Proof of Theorem 4.3
The proof is based on the exact results of the related M/M (1, s)/2/SED/F CF S system, with s ∈ N, presented in [14]. Although we obtain similar results for the limiting conditional arrival rates for server 1 as in [5], we use here a completely different approach in deriving the limits.
In [14], the state space {(q 1 , q 2 ) | (q 1 , q 2 ) ∈ N 2 0 } of the Markov process is transformed to the state space {(m, n, r) | m ∈ N 0 , n ∈ Z, r = 0, 1, . . . , s − 1} where m = min(q 1 , q 2 s ), n = q 2 s − q 1 and r = mod(q 2 , s). Let us denote the equilibrium probabilities for the threedimensional state space as p(m, n, r). The equilibrium probability p(m, n, r) has a series expression, i.e. p(m, n, r) = ∞ l=0 x(l, m, n, r), namely, for m ≥ 0, n ≥ 1, For the exact interpretation of each variable we refer the reader to [14]. In [14] the authors establish the following properties: 1. There exists a positive integer N such that the series in (B.1) and (B.3) converge absolutely for all m ≥ 0, |n| ≥ 1 with m + |n| > N and the series (B.2) converges absolutely for all m ≥ N .
In this proof we make use of the dominated convergence theorem for complex-valued functions.

B.1 Server 1
The limiting conditional arrival rate to server 1 can be determined from where the interchange of the limit and the series for the third term on the right-hand side of (B.6) is allowed by the dominated convergence theorem, because one can bound p(n, m, r) from above by p(0, m, r) and ∞ m=0 p(0, m, r) < ∞ since it is a subseries of m+|n|>N p(m, n, r), which converges absolutely by property 3. Furthermore, we know that lim m→∞ p(m, n, r) = lim m→∞ ∞ l=0 x(l, m, n, r) which is equal to ∞ l=0 lim m→∞ x(l, m, n, r) by the dominated convergence theorem for complex-valued functions in combination with property 2. This allows us to compute the second and third term on the right-hand side of (B.6). The first term on the right-hand side of (B.6) can be determined as follows i − (α l,i , β l,i(s+1) , r) Interchange of the limit and series is again allowed here since one can bound the absolute value of the summands from above by v(l) for sufficiently large n and ∞ l=0 v(l) < ∞. One can finally establish that