Clones: CLOsed queueing Networks Exact Sampling

We present Clones , a Matlab toolbox for exact sampling from the stationary distribution of a closed queueing network with ﬁnite capacities. This toolbox is based on recent results using a compact representation of sets of states that enables exact sampling from the stationary distribution without considering all initial conditions in the coupling from the past (CFTP) scheme. This representation reduces the complexity of the one-step transition in the CFTP al-gorithm to O ( KM 2 ), where K is the number of queues and M the total number of customers; while the cardinality of the state space is exponential in the number of queues. In this paper, we focus on the algorithmic and implementation issues. We propose a new representation, that leads to one-step transition complexity of the CFTP algorithm that is in O ( KM ). We provide a detailed description of our matrix-based implementation. The toolbox can be downloaded at http://www.di.ens.fr/~rovetta/Clones .


INTRODUCTION
Closed queueing networks are largely used in various application domains due to their modeling simplicity and productform stationary distribution in the unlimited capacity case [5].This structure is no longer guaranteed when the queues have a finite capacity, due to the effect of blocking.There are different types of blocking, see [8] for a detailed discussion.According to the terminology introduced in that survey, our blocking model is RS-RD (repetitive servicerandom destination).Under this blocking policy, the stationary distribution has a product form only if the routing is reversible.When the stationary distribution does not have a product-form, the exact analysis may not be computationally tractable, and we may turn to approximations [2] or simulation.The main difficulty of the Markov chain Monte Carlo approach is the stopping criterion.Usual tools are the asymptotic variance in the Central Limit Theorem or mixing times [1,7], unfortunately there is no generic technique for the non-reversible Markov chain case.
In the 1990's, Propp and Wilson introduced a method for sampling a random variable according to the stationary distribution of a finite ergodic Markov chain [9]: the coupling from the past (CFTP) algorithm.The CFTP algorithm automatically detects and stops when the sample has the correct distribution.The main drawback of the original CFTP is that it considers a coupling from all initial conditions.In the case of closed queueing networks the cardinality of the state space is exponential in the number of queues, which is untractable.
When the network has a product-form distribution, Kijima and Matsui [6] proposed a perfect sampling algorithm with overall complexity O(K 3 ln(KM )).However, their method strongly relies on the product form representation of the stationary distribution and it cannot be applied to the general case of queues with finite capacity.In a recent paper by Bouillard and al. [3], a new representation of the sets of states has been proposed.This representation is used to derive a bounding chain for the CFTP algorithm for closed queueing networks, that enables exact sampling from the stationary distribution without considering all initial conditions in the CFTP.This method is far more general, as it does not rely on the product-form property.
The complexity of one-step transition of the CFTP algorithm in [3] is O(KM 2 ).In this paper, we derive a new algorithm, based on even more compact representation.The complexity of one-step transition of this new CFTP algorithm is O(KM ).Note that this is the same as the complexity of the computation of the normalizing constant in the product-form case using Buzen's algorithm [4].The overall complexity of our algorithm depends however also on the coupling time of the chain, so it becomes of a practical interest when the network does not have a product form.The main focus of this paper is on algorithmic and implementation issues.We give a detailed description of the matrixbased representations used in our Matlab toolbox Clones.
The paper is organized as follows.In Section 2 we describe the model and give a short overview of results in [3].We then introduce a new gap-free diagram representation and a new CFTP algorithm with one-step transition complexity of O(KM ).In Section 3 we present our Matlab toolbox Clones and discuss its functionalities and implementation issues.Section 4 is devoted to numerical results.Final remarks and conclusions are contained in Section 5.

BACKGROUND 2.1 Model
Consider a closed network of K ./M/1/Cqueues with M customers in total.Each queue k ∈ {1, . . ., K} =: Q has a finite capacity C k ≤ M and a service rate ν k .After a customer has been served in queue i, it is directed to queue j with probability pi,j, independently of the current state and past evolution of the network.If queue j is full, then the customer remains in queue i and has to be served again.Its new destination will be chosen independently from j.We assume that the routing matrix P = (pi,j)i,j∈Q is stochastic and irreducible (that is, the network of queues is strongly connected).We set R = {(i, j) | pi,j > 0}.
The state space is denoted by S: it is the set of elements , which grows exponentially with K.
For (i, j) ∈ R, ti,j : S → S is the function that describes the routing of a customers from queue i to queue j: ti,j(x) = x + (ej − ei)1 x i >0 and x j <C j where ei is the vector that has all its coefficients equal to 0 except the i-th, equal to 1 and 1A is the characteristic function of A. This network can be described by an ergodic Markov chain: at each step, a pair (i, j) is randomly chosen, with probability ν i p i,j k∈K ν k .Our objective is to sample the stationary distribution with perfect sampling techniques (as in [9]).Algorithm 1 (PSS) is the direct application of the perfect sampling algorithm to this model.It produces a sample of the stationary distribution in finite expected time.
Algorithm 1: Perfect sampling using sets of state (PSS) This algorithm has a complexity at least linear in |S|, which grows exponentially with K.In [3] a new and compact representation of the state space has been proposed, for which sets of states have a strucured representation, polynomial in K and M .The main idea is to represent states as paths in a directed graph called diagram.In the next subsection, we give a summary of results in [3].

Diagram
Let D = (N, A) be a directed graph where N = {0, . . ., K}× {0, . . ., M } and g : S → P(N 2 ) denote the function which associates a set of arcs to each state: These arcs form a path from node (0, 0) to (K, M ).The The complete diagram is D = (N, A) where A = g(S), the image of the state space.An example of such a diagram is given in Figure 1 We can now define the transformation Ti,j corresponding to the service of a customer from queue i to queue j: The transformation Ti,j can be performed directly without having to compute ψ, ti,j and φ.It will be shown in Section 3 that Ti,j(D) can be computed in time polynomial in O(KM 2 ) (while computing ti,j for all states, as in PSS, has exponential complexity).
Theorem 1. PSD algortihm terminates in finite expected time and produces an exact sample from the stationary distribution.

Gap-free diagram
In this subsection, we present a sub-class of diagrams that can be represented even more compactly.
The second and third implications state that the values of arcs from (to) a node are contiguous.It can be easily seen that the complete diagram is gap-free and that gap-free diagrams have a more compact representation: it suffices to know for each value s and each queue k what are the minimal and maximal arc (from the first implication).Consequently, it can be represented by less than 2K(M + 1) arcs.
In most cases, if D is gap-free, Ti,j(D) is also gap-free.But it is not true in general, as shown on Figure 8.For a diagram D, we denote D gf the smallest gap-free diagram (for arc inclusion) that contains D. This construction is illustrated on Figure 2. From this transformation, we can define a new transition function on gap-free diagrams: for (i, j) ∈ R, The perfect sampling algorithm can be adapted to gap-free diagrams.It suffices to replace every occurrences of T in Algorithm 2 by T gf .This is PSF algortihm.

D D gf
Algorithm 2: Perfect sampling using diagrams (PSD) Theorem 2. PSF algortihm terminates in finite expected time and produces a sample distributed according to the stationary distribution.
Proof sketch.The algorithm terminates in finite expected time for the same reason as Algorithm 2: the cou-pling sequence exhibited in [3] produces gap-free diagrams only.

Let (Un)
, and as diagrams representing one state are gapfree, we are ensured that the result of the algorithm is distributed as the stationary distribution.

TOOLBOX IMPLEMENTATION
This section is devoted to the implementation of the transition function of diagrams.As we will see, it is convenient to represent diagrams with matrices.Then a natural choice for implementation is Matlab, as it handles matrix computations (specially when they are sparse) very well.The toolbox Clones is a Matlab toolbox that implements perfect sampling algorithms using different state space representations: state space, diagrams and gap-free diagrams.It is available at http://www.di.ens.fr/∼rovetta/Clones.At the same address there is also a detailed documentation on how to use the Toolbox.In this section, we first introduce the matrix representation of diagrams, then present the transition algorithms and finally describe some additional functionalities of the toolbox.

Representation of gap-free diagrams
Gap-free diagrams can be more efficiently implemented.Because of the implication ), (k, m2 + s) ∈ A, each column can be represented by a 2 × (C k + 1) matrix Fk:

Diagram transition algorithms
In this section, we present the main algorithm of the Clones toolbox: the transition on diagrams.First, we need to find some classes of paths in the diagram: those that will remain unchanged by the transition and those that will evolve.In the whole section we denote by D the diagram matrix representation of diagram D = (N, A) and describe the transition Ti,j(D).

Finding paths in a diagram
To compute Ti,j(D), we need to know what are the paths of D that will remain unchanged because queue i is empty or queue j is full, and what are the paths that will be modified.These paths can be determine only by the knowledge of their arc value at the i-th and j-th columns.So, we need to define matrices D V i→j that represents the paths of D from columns i to j whose arcs at column i have value in V .It is constructed as follows.
Let V ⊆ {0, 1, . . ., Ci} be a set of values, D V i is the matrix that represents arcs in the column i with values in V : Paths from column i to column j generated by arcs with value in V in column i are represented by the matrix Let 1M be the identity matrix of size M .Note that as diag(v) is a diagonal Boolean matrix, computing diag(1M+1Y k−1 )Dk does not require matrix multiplication.It boils down to copying the lines of Dk corresponding to non-null elements of the diagonal of diag(1M+1Y k−1 ).The most complex operation is to compute 1M+1Y k−1 , which can be done in O(M 2 ).As |j − i| + 1 ≤ K, the complexity of finding those path is in O(KM 2 ).

Transition algorithm
We are now ready to present the transition algorithm.Let D be a diagram and (i, j) ∈ R. We explain how to compute Ti,j(D).First let us explain the transformation on diagram D (see Figure 8).We begin by computing three sets of arcs (not necessarily disjoint), depending on whether they will be modified by the transition: • Empty is the set of arcs that belong to paths with arc value 0 at column i.Those paths correspond to states where the queue i is empty.
• Full is the set of arcs that belong to paths with arc value Cj at column j.Those paths correspond to states where the queue j is full.
• Transit is the set of arcs that belong to paths with value < Cj at column j and > 0 at column i.Those paths correspond to states that are modified by ti,j.
Second, arcs in Transit are modified in accordance to ti,j: • In column i, value of arcs are decreased by one (keeping the same source).
• In column j, slopes of arcs are increases by one (keeping the same destination).
If we call Transit ′ this new set, Ti,j(D) = (N, Empty ∪Full∪ Transit ′ ).The transformation using matrix operations is given in Algorithm 3, where the Boolean operators and and or are component-wise and where The algorithm is performed on |i−j|+1 matrices.As |i−j|+ 1 ≤ K, and all the matrix operations can be done in O(M 2 ), Algorithm 3 requires O(KM 2 ) elementary operations.It is implemented by fonction Clones T(D, C, [i, j]) in the toolbox Clones.
To compute transitions on gap-free diagrams, the steps of the algorithm are the same.The only difference is that there is no need to compute all the arcs in Empty, Full and Transit, but only the extremal ones in each column, which ; can be computed directly from matrix F. As a consequence, the transformation can be done in O(KM ).This function is implemented in Clones gfT(D, C, M, [i, j]).

Additional functionalities of Clones
The  Object-oriented programming can also be used with Clones.Three classes have been defined to handle transitions and perfect sampling using respectively sets of states, diagrams and gap-free diagrams.Let R be the routing matrix.The functions for perfect sampling with sets of states, diagram and gap-free diagrams are respectively Clones PSS(C, M, R, N), Clones PSD(C, M, R, N) and Clones PSF(C, M, R, N).They produce N samples.

PERFECT SAMPLING WITH Clones
In this section we are interested in the running time of the perfect sampling algorithms.Experiments have been performed on a desktop machine.
We chose a small number of queues (K = 5) in order to compare running time of Algorithms PSD and PSF against PSS (that has an exponential complexity).Each queue has a capacity M/2 and the routing matrix is fixed and randomly chosen for the whole experimentation.For each sample, the same sequence of random variables (Un)n is used for the three algorithms.Thus they produce the same sample.For each experiment, we produce 100 samples and are interested in the mean running time.Figure 11 compares the perfect sampling algorithms using respectively diagrams and gap-free diagrams when the number of customers is M = {5, 10, . . ., 200}.We can see that for values from M = 5 to 60, the algorithm with diagrams is slightly faster than the one in gap-free diagrams, whereas for higher values of M , using gap-free diagrams is much more efficient than using diagrams.The latter result was expected, as the algorithmic complexity of one step transition is in O(KM ) for gap-free diagrams instead of O(KM 2 ) for diagrams.The running time when M ≤ 60 is more surprising.We believe that the main reason is that Matlab is optimized for matrix operations.Specially, when it comes to sparse matrices.On the other hand, our implementation of gap-free diagrams cannot benefit from this optimization, and there is room for improvement.

CONCLUSION
In this paper, we presented the toolbox Clones, that performs efficient coupling from the past of closed queueing networks.This toolbox enables the comparison between three different algorithms (CFTP with sets of states, diagrams and gap-free diagrams).Experimentally, PSD and PSF algorithms are very efficient compared to PSS algorithm.For small values of M , the CFTP with diagram implementation terminates faster than with gap-free diagrams.Future work will focus on the improvement of the gap-free one-step transition implementation.
Further investigation also include the study on the coupling Experimentally, the difference between the coupling times of the three implementations is rather small, but obtaining a theoretical bound would prove the efficiency of our approach.Last, we will also explore possible extensions to other classes of closed queueing networks.

Figure 2 :
Figure 2: Transformation of a diagram with a gap into a gap-free diagram.Left: D has a gap (no arc ((1, 1), (2, 2)); Right: D gf is constructed by adding this arc to D.
Let D = (N, A) be a diagram and k ∈ Q.In the toolbox, diagrams are implemented with incidence matrices.More precisely, Dk is a Boolean matrix of size (M + 1) × (M + 1) representing the k-th column of D: Dk(m, m ′ ) = 1 ((k−1,m),(k,m ′ ))∈A .Because a ∈ A implies that 0 ≤ v(a) = m − m ′ ≤ C k , matrices Dk are Boolean upper-triangular and Dk(m, m ′ ) = 0 if m ′ − m > C k .Figure 3 gives an example.

Figure 3 :
Figure 3: Matrix representation D2 of the second column of D.

Figure 4 :
Figure 4: Diagram representation of the complete diagram in Figure 1.

Figure 5 :
Figure 5: Gap-free representation of a column.The gap-free representation of the diagram of Figure 1 is given in Figure 5.Using the notations defined for D, a gap-free diagram is represented by matrix F = F1,K of size 2 × ( K k=1 C k + K). Figure 6 shows F, the gap-free representation of the complete diagram of Figure 1.It is a 2 × 14-matrix.

Figure 6 :
Figure 6: Gap-free representation of the complete diagram in Figure 1.

Figure 7 :
Figure 7: Diagram D and D {2} 3→5 in bold (red) paths.It corresponds to paths from column 3 to column 5 generated by arcs of values 2.

Algorithm 3 :
Transition algorithm Data: D the matrix representation of a diagram D, (i, j) ∈ R Result: D ′ the matrix representation of Ti,j(D).
conversion of a diagram D into a set of states, ψ(D), is performed by Clones Psi(D).Function Clones CardPsi(D) returns the cardinal of ψ(D).Function Clones StateSpace(C, M) uses Clones Psi(D) to compute the entire state space.Diagrams can be plotted with Clones plotD(D, C, mode); mode allows to write parameters or |ψ(D)| in the graphic (Figure 9).It is also possible to have diagram animations.

Figure 9 :
Figure 9: Plot of a complete diagram.

Figure 10
Figure10compares the running times for the three algorithms with M ∈ {2, 4, . . ., 60}.It can be observed that the running time of Algorithm 1 grows exponentially fast, and the two other algorithms (with diagrams and gap-free diagrams) terminate much faster.