A novel Self-Similar Tra ﬃ c Prediction Method Based on Wavelet Transform for Satellite Internet

With service types and requirements of broadband satellite internet continuously increasing, improving QoS (Quality of Service) of satellite internet has attracted extensive attention. To reduce the impact of self-similarity caused by various of service tra ﬃ c sources converging on satellite communication system, this paper establishes a novel model from the perspective of self-similar tra ﬃ c prediction. A method combinating wavelet transform and ARIMA (Autoregressive Integrated Moving Average) model to predict self-similar tra ﬃ c of satellite internet is proposed. The optimal prediction model is presented. The number selection of prediction samples and the impact of prediction steps on the accuracy of the prediction system are discussed, and the parameters are addressed. Simulation results show ARIMA model with a combination of wavelet transform can achieve a better prediction than that of the traditional autoregressive model, not utilizing wavelet technology.


Introduction
Nowadays the prosperity of Internet promotes the rapid development of Next Generation Network which consists of air-space-ground integrated network.[1] As a kind of auxiliary network of terrestrial networks, broadband satellite network which has advantages in global coverage, can release the pressure of terrestrial networks.However, it also face a series of challenging problems to be solved.[2][3] The requirements of different service, such as bandwidth and delay, are obviously different.Convergency of various of service traffic sources results self-similarity of the service traffic on satellite communication system [4], having a clear difference from the traditional Poisson characteristics.The self-similar process, which can reflect the long range dependence of the traffic, is one of simple models # Please ensure that you use the most up to date class file, available from EAI at http://doc.eai.eu/publications/transactions/latex/ * Corresponding author.Email: 13B905008@hit.edu.cn with long range dependence, reflecting the performance of traffic at any time.
With service types and quantity growing, the selfsimilar degree will continuously increase.As a result, the self-similarity of traffic has the adverse effect on satellite communication system, aggravating congestion of the network nodes, as well as vibration and vibration delay.A large number of studies show that the main reason of the situations, such as high rate of buffer overflow, lengthened transmission delay and the continuous increasingly periodic congestion, is the self-similarity of network traffic.The traffic self-similarity directly affects structure design, control mechanism, analysis methods and management measures of the next generation network communication system.Besides, packet loss rate rises along with the increase of the self-similarity degree, while the bandwidth utilization decreases.[5][6] [7][8] Therefore, it is necessary to predict the network traffic reasonably and accurately, in order to provide the prediction queue information to the scheduling star and reliable data for the crosslayer scheduling of satellite Internet.Accurate traffic prediction results can ease the network congestion and optimize the resource allocation to improve the performance of satellite network.
Currently, the research on prediction for internet self-similar traffic has made a lot of achievements both domestically and externally, such as the wavelet prediction model in [9], the neural network prediction model and various time series prediction models in [10], as well as the multi-fractal wavelet model [11], and the finite impulse response neural network prediction model of the multi fractal [12], etc.However, the prediction accuracy of the proposed prediction models, on whose condition variables there is no detailed discussion, is not very satisfactory.It is extremely difficult to ensure the accuracy of real-time prediction for network traffic in the case of high bit error rate on such satellite networks.
In this paper, the method combining wavelet transform and ARIMA model is proposed to predict self-similar traffic of the satellite network.Wavelet decomposition makes the non-stationary, periodic and self-similar network traffic stationary and the degree of self-similarity reduced to achieve traffic prediction reasonably and accurately.With a large number of simulation experiments, the parameters of prediction model are analyzed in details and selected, so as to obtain the optimal prediction model of network traffic improving the accuracy of the prediction.Therefore, it is required to perform a reasonable and accurate prediction for satellite network self-similar traffic, aiming to enhance the QoS of satellite internet.

ARIMA Model
ARIMA model is employed to process non-stationary time series.And white noise is processed to obtain historical independence white noise, so as to improve the prediction accuracy.Compared with the traditional prediction models such as AR, MA, ARMA, ARIMA model established based on the Markov process can accurately capture several features on the network.Processing the collected traffic data with the prediction model established to predict the traffic can get the high prediction accuracy.
The basic idea of ARIMA is to use a specific math model to approximate time series.The model can predict the future value through past values of the time series and the moment values as long as recognized.
ARIMA can be described as, where Φ(B) and θ(B) represent the p and q polynomial respectively.The expressions are shown as, where . In general, d is zero or one in ARIMA(p, d, q) model.Then non-stationary time series will be changed into stationary time series [13].

ARIMA Prediction Model Based on Wavelet Transform
Wavelet transform can exhibit both global and local characteristics of traffic, and reduce the self-similarity degree of signal decomposed on the different frequency ranges.High frequency part of the decomposed signal is short range dependence, and therefore prediction can be made directly using ARMA model.Nevertheless, the low frequency part having long range dependence is not able to be predicted accurately only utilizing ARIMA model.Thus, the signal is decomposed using wavelet transform level by level, then the levels are predicted respectively and recombined together to achieve network traffic prediction.Wavelet transform improves network traffic prediction accuracy.Self-similar traffic is decomposed by wavelet transform of different decomposition level, and wavelet function takes sym N = 2, 3, 4. By calculating the decomposition of the Hurst, as shown in Table 1, we can see that the correlation of the details of the wavelet decomposition is short, and only the approximate part has long-range dependence.Therefore, this paper selects three level wavelet decomposition to process the self-similar network traffic, so as to achieve the time series of network traffic stationary and weaken the correlation, and guarantee the efficiency of system.
In ARIMA(p, d, q) model, a first-order difference can achieve the expected effect that high degree of traffic self-similarity is changed into low degree of the one, and the data reaches a plateau.Therefore, we take d = 1.The autocorrelation function and  partial correlation function of the details and the difference approximation part, which are decomposed by three level wavelet decomposition, both show smear characteristic, therefore ARMA model should be chosen.To determine the order p, q of the model the AIC (Akaike information criterion) criterion will be used for order selection.The order of each part of the wavelet decomposition model is shown in Table 2.
After determining the order of the prediction model, ARIMA(p, d, q) model is established.Then the parameter estimation method of ARMA model will be presented.Autocovariance function of ARMA(p, q) series satisfies extending Yule-Walker equations, This is the method to estimate parameters, in which obtaining moment estimation of a The p × p matrix Γ p,q is invertible in (5).When N → ∞ in ARMA(p, q) model, the Γ p,q in (6) is invertible.Based on the above conditions, moment estimation (6) are consistent, namely lim N →∞ a j = a j , 1 ≤ j ≤ p. Next we will estimate partial parameters in MA(q), namely a 1 , a 2 , ..., a p .Then (7) is an approximate measurement data to MA(q).The auto covariance function is given in (8) with a 0 = −1, Finally, the stationary time series is predicted.In order to predict X n+k with X n = (X 1 , X 2 , ..., X n ) T , first we use ARMA sequence to achieve optimal linear prediction based on Y d+1 , Y d+2 , ..., Y d+k , Based on the formula, It comes to the recursive formula where X n−j = X n−j , j ≥ 0.

Simulation and Analysis
Since traffic data of satellite communication system can not be obtained, this paper will use the data collected in the ground backbone network from http://mawi.wide.ad.jp/mawi/ditl/ditl2009/.The data is collected continuously for 96 hours in every 15 minutes, which is 380 data in total.After calculation, the average rate of the ground backbone network can reach 98 Mbps, slightly less than the traffic receiving rate of 100 Mbps on satellite switches of broadband satellite network.So it can simulate the traffic in broadband satellite network with the set of data to establish prediction model.The traffic with a day (24 hours) period as time unit is non-stationary and has a very obvious sudden.By the simulation, the Hurst parameters H ≈ 0.7308, according to H ∈ (0.5, 1) can observe that the network traffic shows an obvious selfsimilarity.This paper will use this traffic data.The impact of prediction step k and the number n selection of predicted samples to the traffic data prediction accuracy of prediction system is discussed.
First, the impact of the number n selection of predicted samples on prediction accuracy is analyzed.The smaller the relative root mean square error (RRMSE) of prediction results is, the higher the prediction accuracy is.
The original traffic data is decomposed with the three level wavelet transform, and the prediction step in prediction system is set to 1.By simulations, the RRMSE between the prediction value from different number selection of predicted samples and the true value is obtained, as shown in Fig. 1.
Network traffic with self-similarity is subdivided into different frequency range by utilizing wavelet decomposition in order to observe the self-similarity   of detail and approximate parts.The decomposition signal with the short range dependence is to be predicted in the next step, while doing difference to the other one with the long range dependence to produce the signal which is short or approximately short range dependence.The accuracy of analysis is improved because of the stationary and multiresolution processing to original signal.
As observed in Fig. 1, the better prediction results lie in the range of the interval [95, 100], [190,200], [290,300].The number of samples in the range of [95, 100] have the best effect, and the RRMSE reaches the minimum value.The three interval in which prediction accuracy is high is approximately a multiple of 96, mainly because traffic data used in the model is collected every 15 minutes, then record 96 times within 24 hours, therefore in the original model data 96 is a data cycle.The number selection of predicted samples is related to periodicity of network traffic data itself, and choosing the number of predicted samples which is in accordance with periodicity can obtain high accuracy prediction results.Fig. 2 is the comparison between the real traffic and the prediction traffic when the number of samples is set to 100.As demonstrated in Fig. 2, the fitting effect choosing this number of samples is fine, and the predictive value and the real value maintain a high degree of consistency in the burst point, while it also reflects the periodicity of the original traffic data.Next, the impact of different prediction steps k on prediction accuracy is analyzed.The number of predicted samples is set to 100.
As shown in Table 3, it can be seen that the smaller prediction steps are, the better prediction effect is.However, difference accross RRMSE of onestep prediction, two-steps and five-steps prediction is not great.If using one-step prediction which has the best effect, the ratio between the number of predicted samples and the predicted number is 100: 1, but the efficiency of prediction model is very low.
In Fig. 3 and Fig. 4, in fitting between the prediction value and the real value, there is not much difference between two-steps prediction and five-steps prediction.In particular, the five-steps prediction is better than the two-steps one in the burst traffic data prediction.Compared with the ratio between the number of twosteps predicted samples and the predicted number is 50: 1, the ratio is 20: 1 in five-steps prediction.Thus, the prediction step is assigned to 5, and time series in autoregressive prediction model can get the best prediction value.

Conclusion
This paper presents that the self-similarity of traffic has the adverse effect on satellite communication system, and traffic prediction model based on ARIMA model with wavelet transform is established to reduce the

Figure 1 .
Figure 1.RRMSE of different number of predicted samples.

Figure 2 .
Figure 2. Comparison between the real and prediction traffic (number of samples is 100).

4 EAIFigure 3 .
Figure 3.Comparison between the real and prediction traffic in two-steps prediction

Figure 4 .
Figure 4. Comparison between the real and prediction traffic in five-steps prediction

Table 1 .
wavelet decomposition with hurst parameter Bx t = x t−1 , where B is the delay factor, ∇ d represents the differential factor of d order, and the relation between them is ∇ = 1 − B, where ∇ is the differential operator.The binomial expansion is,

Table 2 .
Order of each part of the wavelet decomposition model

Table 3 .
RRMSE of different predicted steps