Mobility Patterns Mining Algorithms with Fast Speed

In recent years, mobile networks and its applications are developing rapidly. Therefore, the issue to ensure quality of service (QoS) is a key issue for the service providers. The movement prediction of Mobile Users (MUs) is an important problem in cellular communication networks. The movement prediction applications of MUs include automatic bandwidth adjustment, smart handover, location-based services,… In this work, we propose two new algorithms named the Find_UMP_1 algorithm and the Find_UMP_2 algorithm for mining the next movements of the mobile users. In the Find_UMP_1 algorithm, we make to reduce the complexity of the traditional UMPMining algorithm. In the Find_UMP_2 algorithm, we perform to reduce the number of transactions of the User Actual Paths (UAPs) database. The results of our experiments show that our proposed algorithms outperform the traditional UMPMining algorithm in terms of the execution time. In addition, we also propose the UMP_Online algorithm in order to reduce the execution time as adding new data. The benefit of applying the UMP_Online algorithm is that the system can run online in real time. Therefore, we can perform the applications effectively.


Introduction
Currently, due to the rapid development of mobile communication networks, many people use personal mobile devices to search for information on the Internet.Almost everyone has a mobile device as cell phone, personal digital assistant (PDA) or notebook.In addition, many people search for information while traveling all over the world.At about 6.8 billion mobile phones are used around the world in 2013 at the rate of 96, 97% of the world population [1].
Therefore, the propose problem is how to ensure quality of mobile services.
In cellular communication networks [2], a mobile user can move from one location to another which neighboring cell in the network.When MUs move like that, the location of mobile users will be constantly updated to Visitor Location Register (VLR) ( [3], [4], [5]) of the system.VLR is an intermediate database in order to store temporary information about mobile users in the service area of Mobile Switching Center (MSC).The location information of MUs then is transferred to home location register (HLR).The Giang Minh Duc et al.

2
HLR is a database which long-term storage of information of MUs.The movement history of MUs is extracted from the log files and it is stored in the HLR of the MSC.The historical data is used to predict the mobility of MUs.
Due to the properties of the cellular communication networks are mobility, disconnection, long time delay, handoff, bandwidth continuously changing... so there were some recent researches, which applied the traditional User Mobility Pattern (UMP) Mining algorithm ( [7], [8], [9], [10]) to overcome these problems.However, the UMP Mining algorithm has a long execution time, running offline.Therefore, the above applications are reduced effectiveness.
In particular, our main contributions can be summarized as follows:  Our proposed algorithms make increased running speed of the traditional UMP Mining algorithm in two ways.
(1) We perform to reduce the complexity of the traditional UMP mining algorithm.(2) We reduce a number of transactions when movements mining of MUs. We propose UMP_online algorithm to avoid scanning of full database again.This algorithm executes to mine the new dataset (new transactions are added to the database).Therefore, the mobile service providers (MSPs) can supply their applications more efficiently. The results of our experiments show that: -Execution time of the first improvement (Find_UMP_1 algorithm) reduces more than 25% compared with the traditional UMPMining algorithm.-Execution time of the second improvement (Find_UMP_2 algorithm) reduces more than 75% compared with the traditional UMPMining algorithm.
-The third improvement (UMP_Online algorithm) has an execution time down about 57.94% compared with the Find_UMP_2 algorithm.
The rest of our paper is organized as follows.In section 2, we present related work.In section 3, our proposed scheme is explained.Finally, we present the experimental results in Section 4 and conclude our work in section 5.

Related work
Problem mining sequential patterns mentioned in [6], [7], [8], [9].The algorithm in [6] applied Apriori algorithm in grid computing and does not take into the topology of the network while creating the candidate patterns.In [14], Mira H. Gohil and S. V. Patel compared different methods of the next location prediction.
The UMPMining algorithm in [7] predicts the next location of mobile users using data mining techniques .In [7], Yavas et al presented an AprioriAll based sequential pattern mining algorithm to find the frequent sequences and to predict the next location of the user.They compared their algorithm's results with Mobility Prediction based on Transition Matrix (TM).In [10], Byungjin Jeong applied the UMPMining algorithm to perform the decision smart handover for the purpose of reducing the number of unnecessary handover in architecture Macro / Femto -cell networks.
In [11] and [12], the authors also applied the UMPMining algorithm for location-based services (LBSs) in the cellular communication networks.In [11], Abo-Zahhad et al. presented LBSs as emergency, safety, traffic management, and public information applications.In [12], Lu et al., presented to find segmenting time intervals where similar mobile characteristics exist.
The above works apply the traditional UMPMining algorithm to enhance quality of services.Our algorithms improve the traditional UMP algorithm to further enhance the quality of services.

Proposed scheme
Before giving our proposed algorithms, we present the traditional UMP Mining algorithm in [7] and calculate its complexity in subsection A as follows.

The traditional UMPMining algorithm:
Suppose that User Actual Paths (UAPs) have a form as follows: C = {c 1 , c 2 ... c n }.Each c k denotes the ID number of the cell k th in coverage area.
For example, we have the coverage map simulated as follows: The data of UAPs as follows: Mobility Patterns Mining Algorithms with Fast Speed 3 G is called a directed graph corresponds to cells in the mobile coverage area.Each cell of the G is a node as Fig. 1.If there are two cells that called A, B neighboring each other (a common border) in the mobile coverage area, they have a directed and unweighted edge from A to B and from B to A.
In addition, B is a substring of A, if all cells of B exist in A (not need sequent in A).
For example, in Figure 1, suppose that A = {c 4 , c 0 , c 6 , c 7 , c 8 , c 5 ) and B = {c 6 , c 8 } is the length-2 sequence of A. In addition, The UAP B is contained by the UAP A.
S = {s  s  C k and s is subsequence of a} 7.
for each s  S do 8.
s.count = s.count+ s.suppInc 9. endfor 10. endfor At line 13, the Cand_Gen() function is written as follows:  Thus, the complexity of this algorithm is: O(mn).

In order to reduce the complexity of the UMP Mining algorithm we perform steps as follows:
EAI Where P(X) is a set of subsets of X.A pair of function (,) is defined in such that is called Galois Connection.(S) value denotes a set of transactions that have common all cells in S. (X) The value denotes a set of cells that have in all transactions of X.

Definition 6: M dd mobility Matrix
The M dd mobility matrix is similar to the binary matrix as definition 3, but it is added as follows: each M [O m , i n ] is a location of a mobile user traveling in mobile network (Table 2).
Column i n : code of a cell in mobile network.
Row o m : the actual paths of a mobile user.
We exchange data from the table 1 to table 2 as follows:  1. L 1 =  2. for each (i  I and j  field of M dd ) //i: cell ID and it is also a column of M dd 3.
S={s | s  M dd and s ij  0} 4.
for each s  S 5.
s.count = s.count+ 1 6.The complexity of the Find_support algorithm is reduced n times (reduce of one loop) compared to UMPMining algorithm.

Find_UMP_2 algorithm
The Find_UMP_2 algorithm is similar to the Find_UMP_1 algorithm, they differ from the function to find the support, as follows:  Decreasing the number of transactions: According to the clause 2, we have:

UMP_Online algorithm
In this section, we develop the incremental algorithms to find the large sets from the mobile database.The proposed algorithm is named UMP_Online.In order to avoid s canning of full database again, this algorithm executes to mine the new dataset (new transactions are added to the database).
The purpose of this algorithm is to reduce the execution time of mining the MUs movements.Therefore, MSPs can supply their applications more efficiently.
Here is the UMP_Online algorithm: The old result table is the candidate sets C i and the large patterns L i (C i , L i are found as running the Find_UMP_2 algorithm).This algorithm uses the previous results and takes update to the mobility patterns as follows:

UMP_Online algorithm
-Finding the candidate patterns (C inew ) from the new data set.
-The support value is calculated for each sequence c  C inew , if the min_supp value is satisfied.
-Update the candidate patterns (if c.supp ≥ min_supp) to the old candidate patterns (C i , L i ).
-Returning the new large set: L.
Due to the UMP_Online algorithm returns the result which be set of the large sets L, we should prove that the set Theorem 1: the Find_L k algorithm ensures to find enough all keys.
Using the inductive method to prove the Find_L k algorithm that ensures to find all keys.

Finding the mobility rules
According to the results from the data mining phase (UAPs  UMPs); the mobility patterns of mobile users (UMPs) were founded.In this section, we will find the mobility rules from UMPs.
Example: we have a form UMP is (3,4,5).The mobility rules as follows: (3)  (4, 5) (3, 4)  (5) Suppose that we have the UMP L = {i 1 , i k }, where k > 1.All mobility rules are generated from the pattern as follows: Give the mobility rule R is : (i 1 , i 2 ,…, i m-1 )  (i m , i m+1 ,…, i k ), the confidence value is calculated as follows: By using UMPs, all mobility rules are generated and the confidence value is also calculated.Rules (if confidence  min_conf) will be selected.

. Finding the mobility rules:
We have the rules generation algorithm as follows:

2.
for all m from 1 to k -1 do 3.
//get all the mobility rules 4. head =(i At line 4, the head is the part of the rule before the arrow.At line 5, the tail is the part of the rule after the arrow (rule = head  tail).
After running the Gen_Rules algorithm, we have the results table from actual data as follows (min_conf = 5%):    -Database (total) = database (old) + database (new) -Running the Find_UMP_2 algorithm for database (total). When applying the algorithm UMP_Online: We perform as follows: -Get the old results (Cn, Ln).
-Running the UMP_Online algorithm for the new database and update the result with the old results  new results.
To compare the results of the two methods above, we have the actual results as follows:  -Precision: the number of correctly predicted cells / the total number of predictions made. Changing of the recall values according to the min_supp values: In Figure 6, if the min_supp value increases, then the recall value decreases.The reason is the increasing min_supp value will make the number of prediction rules   When changing the minimum confidence value (min_conf), the precision value changes as Figure 7.
In Figure 7, when the min_conf value increases, the precision value also increases.Because of high min_conf values, only the rules that have high confidence values are used for prediction.

CONCLUSION
The mobility prediction of Mobile Users is one of the important issues in mobile computing sys tems.Applications of the MUs mobility prediction are adjusting bandwidth of the networks, the location-based services, smart handover, ...However, these applications require the execution time of the UMPMining algorithm as quickly as possible.In this work, we proposed Find_UMP_1 algorithm and the Find_UMP_2 algorithm to solve the time problem.The results of our experiments shown that our proposed algorithms outperform the traditional UMPMining algorithm in terms of the execution time.
In addition, we also propose the UMP_Online algorithm in order to reduce the execution time as adding new data.The benefits of applying this algorithm are that the system can run online in real time.Therefore, MSPs can perform the above applications effectively.

Figure 1 .
Figure 1.The simulation of the cellular network and graph G

endfor 8 . 7 .
endfor 9. return Candidates For UMPMining algorithm, from line 5 to line 10 (finding support of C n ) is rewritten as follows: Find_support_ UMP(S k ) Input: database D Output: SP(S k ) (support of S k ) 1. for each (UAP a  D) do //scan all database D 2. for (i = 1; i  |a|; i++) do //|a|: length of sequence a 3. Find position (s 1 , s 2 , …, s k )  S k in sequence a return SP(S k ) The complexity of the Find_support_UMP function:  For the loop at line 1: the complexity is O(m), where m = |D|  For the second loop (line 2): the complexity is O(n), where n = | a |: the average length of string a  D.

4 3. 2 .Definition 2 :Definition 3 :
Endorsed Transactions on Context-aware Systems and Applications 10 -11 2015 | Volume 2 | Issue 6 | e2 Giang Minh Duc et al.Find_UMP_1 algorithm We map the UAPs database (D) to the M dd Mobility Matrix (definition 6).Steps as follows: Data Mining Context Let O be a non-empty limited set of transactions (UAP ID) and I be a non-empty limited set of cells, R be a two subject relation between O and I such that o  O and i  I, (o,i)  R  transaction o contains cell i th .The data mining context is the triple (O, I, R).Data Mining Context Matrix Give a mobile user's paths table includes two properties that are UAP_ID (code of a transaction) and UAP (path of a mobile user through the cells of the mobile coverage map).Call O is a set of transactions.I is a set of cells and R is a two subject relation between O and I, R  OI, where (o, i)  R if and only if transaction o is contained cell i th .Definition 4: Galois Connection Give a data mining context (O, I, R), where two functions  and , they are defined as follows:  P(I)  P(O) and  P(O)  P(I):

1. 4 1 : 2 :
(((X))) = (X) and (((S))) = (S) Definition 5: the frequent set Give a data mining context (O, I, R), and S  I, the frequency level of S is defined as the ratio of the number of transactions to all of the transactions .The frequent of S is called the support of S (SP(S)) and it is computed as follows: Where .  is the length of the set.Give S  I and min_supp is a minimum support threshold, S is a support set by the min_supp threshold if and only if SP(S)  min_supp.FS (O, I, R, min_supp): is the set of the support subsets satisfy the min_supp threshold or FS (O, I, R, min_supp) = {S  P (I)  SP(S)  min_supp} Clause Give S  FS(O, I, R, min_supp), if T  S, then T  FS(O, I, R, min_supp) Demonstration: due to T  S, according to property (1.1) of the Galois Connection of a pair of function (, ), we have (S)  (T), therefore min_supp  SP(S)  SP(T)  T  FS(O,I,R,min_supp).Clause Give T  FS(O, I, R, min_supp), if T  S, then S  FS(O, I, R, min_supp).Demonstration: due to T  S, according to property (1.1) of the Galois Connection of a pair of function (, ), we have (S)  (T), therefore SP(S) ≤ SP(T) < min_supp  S  FS(O,I,R,min_supp).

endfor 7 . 11 . 1 .
endfor 8. L = {s | s  C 1 , s.count ≥ min_supp} 9. L 1 = L 1  L 10. return L 1 At line 4 of Find_UMP_1 algorithm, we have a function finds L k from L k-1 as follows: Find_L k (L k-1 ) algorithm Input: L k-1 , G, M dd Output: L k 1. L k =  2. for (each X  L k-1 ) do 3. for (each Y  L k-1 and X  Yreturn L k At line 6 of Find_L k (), we have a function finds the support of S k as follows: Find_support(S k ) algorithm Input: S k , M dd Output: SP(S k ) for each o  M dd do //scan all M dd 2. Find location (s 1 ,s 2 ,…,s k )  S k of o  M dd 3. Find S k .count 4. endfor 5. return SP(S k )  The complexity of the Find_support() algorithm: -For the loop at line 1: the complexity is O(m), where m = |O|: the total number of records of M dd -Thus, the complexity of the algorithm is: O(m).

EAI 7 L
Endorsed Transactions on Context-aware Systems and Applications 10 -11 2015 | Volume 2 | Issue 6 | e2Mobility Patterns Mining Algorithms with Fast Speed finding enough all L i .When Li gives rise (line 9 and 15), the algorithm calls the Find_L k () function (line 10 and 16).

First, L 1
is true because L 1 = {SP(I) | SP(S) ≥ minsupp  |S| = 1} Suppose that L k-1 is true, we should prove the Find_L k algorithm creates L k true.That is L k contained all large sets S, so that | S | = k.Indeed, due to set X  F k-1 and Y  F k-1 , so |X| = |Y| = k-1.In addition to wanting S = X  Y is a candidate, then |S| = k (line 6 of the Find_L k algorithm).According to clause 2, the set S  L k must be the large sets  L k candidates should be created from L k-1 (line 2, line 3 of the Find_L k function).

Figure 3 .
Figure 3.The execution time results of three algorithms

Figure 4 .
Figure 4.The execution time total of three algorithms

Figure 5 .
Figure 5.The execution time of two algorithms , the number of correctly predictions is decreased.

Figure 6
Figure 6 compares the recall value changes of three data sets.When the size of the training set increases, the recall values also increase (because the number of prediction rules increases).

Figure 6 .
Figure 6.Changes of recall according to min_supp of three data sets

Figure 7 .
Figure 7. Precision of the prediction rules

Table 1 .
Paths of mobile users

Table 2 .
Mobility matrix of mobile users For example, in table 2, mobile user 2 (UAP ID = 2) moves between the cells as follows: Input: New candidate sets have length-i: C inew

Table 6 .
The results of three algorithms