A local field correlated and Monte Carlo based shallow neural network model for non-linear time series prediction

Water resource problems currently are much more important in proper planning especially for arid regions, such as Gansu in China. For agricultural and industrial activities, prediction of groundwater status is critical. As a main branch of neural network, shallow artificial neural network models have been deployed in prediction areas such as groundwater and rainfall since late 1980s. In this paper, artificial neural network (ANN) model within a newly proposed algorithm has been developed for groundwater status forecasting. Having considered previous algorithms for ANN model in time series forecast, this new Monte Carlo based algorithm demonstrated a good result. The experiments of this ANN model in predicting groundwater status were conducted on the Heihe River area dataset, which was curated on the collected data. When compared with its original physical based model, this ANN model was able to achieve a more stable and accurate result. A comparison and an analysis of this ANN model were also presented in this paper.

Time series forecasting has remained a challenging prob-lem in environmental research.Since the late 1980, not only shallow neural networks but also multi-layer perceptions (MLP) have been widely deployed to time series forecasting.Later on, feed-forward neural networks and support vector machines are well developed.In the past years, deep learning, which was derived from multi-layer perceptions, has been considered as a most brain-like artificial neural network and is applied to many application fields in machine learning and data analysis.Simultaneously a lot of algorithms aiming to design an efficient artificial neural network have also been proposed and been adopted to corresponding data sets, such as "Back-Propagation", "Extreme Learning Machine" and "Boltzmann" [1].Neural network model is suitable for analyzing internal relations in dealing with a variety of data, not only in pattern recognition, but also in time series forecasting.It has been considered as a powerful tool to solve the problems including classification, forecasting in most of the areas, such as com-puter vision, text mining, ubiquitous computing [2] and also web service [3,4].A proper design of ANN models could also serve as a robust and efficient tool in water resources modeling and forecasting [5].
Most of the ancient civilizations have made use of wells, spring water and groundwater since the Bronze Age [6].Groundwater is being considered as the most valuable natural resources in the future, and it becomes more important for many different climatic regions all over the world [7].Now that groundwater supply is being in danger, how to make a sustainable management of groundwater resources is by far the utmost important issue for us.Groundwater use in irrigation is a vital problem in many arid and semi-arid regions especially when we take a deep look in the Heihe Region in Gansu Province of China.In this paper, we mainly focus on the Heihe Region and all the data ranges from 1986 to 2008.
As model validation is a main tool in optimization, for groundwater management there are several models being used to help optimizing groundwater usage [8].Box-Jenkins Model [9], ANN, autoregressive integrated moving average (ARIMA) [10] and Physical Based Model [11] are used for groundwater research including time series forecasting.
In this paper, we proposed a new algorithm for conventional artificial neural network, which was different from Back Propagation, SVM and others.With its simple architecture and powerful performance, we have shown its experimental result in the Heihe River dataset.The rest of the paper was organized as follows.In next section, we will review the Box-Jenkins Model, ARIMA model, Physical Based Model and ANN to time series forecasting.The newly proposed algorithm is introduced in Section III.We will also report and discuss empirical results from Heihe River data sets in Section IV.Section V will conclude with these empirical results.

II. RELATED WORK
Among many approaches to time series forecasting, linear models and non-linear models are two major branches.For linear models, it is a smooth mapping from data space to feature space for ARIMA model to predict the future values from past values, which are constrained in a specific linear functions [10].It is simple to match different data sets.However in the real world, linear models would not be able to fit and match many data sets because they include not only internal structure but also lots of different signals, such as noises.A simple addition of several ARIMA models could not solve this problem.For certain non-linear patterns observed in real problems, bilinear model, threshold autoregressive model (TAR) and auto regressive conditional heteroscedastic model (ARCH) are derived from ARIMA model.
Even though these expanded models have been proposed for non-linear time series forecasting, they are mostly based on simple linear models and their performance is not satisfying.Especially when they deal with other non-linear time series data sets, these models such as TAR, ARCH are only developed for specific non-linear time series patterns.
Since artificial neural network has been developed, it has been expected to be an excellent alternative to time series forecasting.Its non-linear modeling ability shows its power in both linear and non-linear time series forecasting.Since 1980s, ANN, especially feedforward neural network, is not only applied in time series forecasting but also other computational intelligence areas.With as many variants as multilayer perception (MLP), recurrent neural networks (RNN) and classical feed forward neural network, currently ANN is one of the most promising tool.To deal with non-linear time series forecasting, ANN could be a better model than ARIMA models.Universal Approximation Theorem states that a feedforward neural network with one single hidden layer can approximate continuous functions when activation functions are set properly on each hidden layer neurons [12].Several proofs have been given and further proof shows that neural networks have the potential of being universal approximation regardless of activation functions when it comes to multilayer architecture.
In [13] and [14], authors proposed using Back Propagation algorithm to train a neural network to predict the water level.The basis of BackPropagation algorithm mostly focus on the parameters and learning rate settings.Also the activation functions in the hidden layer and the output layer are important in this algorithm.In this paper, we will compare our model while the Back Propagation is our baseline method.
However [1] also discussed a few existing problems of ANN from several aspects.How to define a proper ANN structure for time series forecasting is well known for introducing the over fitting.To solve this problem we depend mostly on the operator's practical experience.The other problem is about a suitable learning rate for training.Either too high or too low of the learning rates will always lead to different learning optimal problems.In this paper, we propose a new algorithm in ANN models and apply this model in time series forecasting in Heihe River data sets.

III. ALGORITHM
For a classic artificial neural network, its basic structure is shown as Figure .1.Only one single hidden layer is deployed and the number of neurons is usually playing a key role in performance of every different algorithm.
From these formulas, there is a bunch of parameters namely {W ij ,W lj ,h i ,b i ,β i ,ε, f }.W ij shows the connection strength between the input layer units x i and the hidden layer units h j .Accordingly W lj shows the connection strength between the hidden layer units h l and the output layer units y j .Formula (1) shows how the input layer units impact the hidden layer units neurons.Every input layer units show its effect on these hidden layer neurons.And Formula (2) contains an activation function f which is considered as a reaction to these inputs.Actually these output of hidden layers neurons are defined by b i ,β i ,f.Back-propagation (BP) algorithm and support vector machine (SVM) methods are normally used in this ANN model.
In our design, we use this classic artificial neural network for our groundwater prediction model.Similar to our earlier work [15], Monte Carlo Algorithm substitutes classic algorithms such BP and SVM for parameter searching.It can be regarded as a computational algorithm based on repeating random sampling to approach an optimal result.While in [15] we have used this algorithm in classification problem, we can also adapt it to our prediction model.In our experiments it shows powerful ability in time series learning not only in underground water prediction but also Smart Grid prediction.In this paper we will focus on underground water prediction.
Unlike that SVM searches support vector and the best fit kernel function for model, Monte Carlo Algorithm will conduct the general vector searching task and finally mapping data task, as much and precise as it can, into the model.As support vectors always demonstrate a good performance for its structure, once a suitable kernel function is found for the dataset, in this paper, Monte Carlo Algorithm is also deployed in a shallow network.With an acceptable time consumption, this structure could lead to a better prediction result.
In our original algorithm, all the parameters in this bunch, {W ij ,W lj ,h i ,b i ,β i ,ε, f }, could be updated to suit its goal.In this paper, we simply fix parameters W lj which is between hidden layer and output layer.The rest of these parameters, including {W ij ,b i ,β i ,ε} could be updated for one parameter each time when the cost function leads to a better performance.We herein use a different cost funtion from [15] to approach the final result.
In this shallow model, W ij and W lj stand for two connected layers, including input layer and hidden layer, and hidden layer and output layer.b i represents the bias of each hidden layer units.The β i correspond to the transfer function coefficients.ε is a correcting unit for each parameters used in Monte Carlo Algorithm.In each iteration we ε to other parameters, herein it is mainly about {W ij ,b i ,β i }.When using ε to adjust other parameters to fit data relationships, each time only one random paramter is selected and being tried to update.Only if the cost function gets a not-worse result should the parameter be updated.In other words, we accept this adjustment only when the cost function ∆ reduces or holds.
As shown below Formula (4) is the cost function ∆.
A main procedure of this Monte Carlo Algorithm is presented in detail below.

A. Initialization
Network requires an initial value to start working.Here we set these parameters W ij , W lj , b i , β i , ε to random values within a fixed interval.These intervals are selected empirically and they are critical in our analysis.

B. Update parameters
Choose one group of the parameters from {W ij ,W lj ,b i ,β i }.Model randomly chooses one parameter from this bunch, adjusts it with the ε and calculates y i and ∆.If ∆ is not becoming worse, then we accept this change and move on to next round.This step stops after a Monte Carlo Steps.

C. Repeat Monte Carlo Steps on other parameters
Change to other types of parameters from {W ij ,W lj ,b i ,β i } after a Monte Carlo Steps.

D. Stop training
Repeatedly train this shallow neural network via B and C steps and stop when either a critical time runs out t ≥ t 0 or cost function ∆ ≤ ∆ 0 .
In our work, this shallow neural network is applied to underground water time series prediction problem.The original underground water data has been obtained for decades [16,17].In [15] we used a modified cost function instead of square root value to overcome the over-fitting problem.In this paper, square root as shown in Formula ( 4) is used to approximate the best prediction model for underground water.
Cost function is the key factor in Monte Carlo Algorithm.It varies a lot to fit different situations for its practical results taking into the consideration of time consumption and accuracy.We have shown a powerful ability of this Monte Carlo Algorithm to fit different situations in our studies in different applications.In next section we will focus on datasets and its experiments.

IV. DATASETS AND EXPERIMENTS
Even though data is collected in a reliable method, there is a data sparsity problem due to inadequate measuring methods in the past decades.Hence we reconstruct these data in a reasonable way.
There is not any pre-processing with these data and its inner noise, which is inevitably existing in these data and is however overcome by our Monte Carlo Algorithms.We show our result without any pre-processing.And the local field theory demonstrate that noise could be used to facilitate model's learning and optimize its result as [18] and [19] had illustrated.Therefore via curating a dataset from the original data, we get a model containing enough units.

A. Dataset
In previous decades, exactly 1986-2008, the underground water data of 26 related spots had been collected.They are all located in Heihe River basin, which means these 26 related spots data could have some internal relationship.Our curated data comes from a sub group of these 26 observation spots, namely Banqiaodongliu spot, Pingchuangongcheng spot, Shandanqiao spot, Wangqizha spot, Xiaohe spot, Xingou spot, Yaruanzhangwan spot, Zhangyenongchang spot and Shajingzi spot.
In this paper, we build a dataset including eight observation feature spots and one observation label spot for a time series prediction.Overall noises exist in this dataset resulted from its data collection technology.However we still input them directly into this shallow neural network for training.It is a bit different from [15].The cost function used in this paper and output layer setting have been changed accordingly.

B. Model
As shown in Figure .2, this is the original shallow neural network with 8 units input, 1 units output and 200/500 hidden neural units.

C. Parameters Value Interval
It is an empirical issue for selecting suitable parameters value interval.And in our paper, {W ij ,b i ,β i } are chosen to be positively updated for a better cost function while W lj remains the same once it is decided at initialization stage.
W lj is fixed into two groups of experiments, for {+0.10, -0.10} and {+0.01, -0.01}.Also through the change of neural transfer function, we have modified hidden neural units to observe its performance.

D. Procedure
In this subsection, the main procedure of our Monte Carlo Algorithm (see Algorithm 1) is presented.

E. Experiments
Experiments are conducted under four different settings but all achieve a good performance in their results.In our experiments, we deploy two different transfer neural functions as x 2 +x and x.Below we will show the results with different settings.
As a baseline work, we chose Back Propagation algorithm as our comparison method, since most of literatures [13,14] used BP as their method.As shown in Fig. Select a random W i j from W i j matrix; Select a random β i from β i matrix; 16:   5) and ( 6), a 200 hidden neural units layer was allocated for our first experiment.In this network, a x 2 + x transfer function was used and also the value of W lj is {+0.1, -0.1}.It delivered a good performance of prediction.7) and (8).It demonstrated that under Monte Carlo Algorithm category, this shallow neural network had a better learning and fitting ability for understanding the data.The high scalability could be obtained via a bunch of parameters W ij , b i , β i , ε even when W lj is fixed.9) and ( 10) demonstrated a better performance for our prediction in the case when we changed the transfer function from x 2 +x to The trends and values in our test have shown a better result under the construction of x transfer neural function.Furthermore, we also tried to configure a different size of hidden neural units.As shown in Fig. (11) and ( 12), the x transfer neural function got a better ability to fit data regardless of the hidden neural units size.In our previous test where x 2 +x set as the transfer neural function, the best result came from 200 hidden neural units.
From Fig. ( 9) and ( 10), a suitable transfer neural function achieved a even better and stable result with 200 and 500 hidden neural units.It showed that searching for general vectors in this shallow neural network was much easier with the deployment of Monte Carlo Algorithm.But for its performance, it would strongly depend on its neural network and cost function for a positive updating process.Tackling water prediction problem nowadays is more important for a proper regional development strategy, especially in the arid Heihe River basin.In this paper, we have curated a dataset from two decades of collected data.A consideration of area related relationship is important in this paper.We tried to figure out the relationship between these nine observation spots selected out of the whole 26 spots.The Monte Carlo Algorithm [15] is an efficient approach as shown in our paper and we also discussed the internal settings and structure of the deployed shallow neural network.
The baseline work in our paper is Back Propagation algorithm.We achieve a better result than BP while in BP the MSE result is 0.56 and in our model it is 0.113.The results is promising in dealing with the difficulties in such problem.
In the future work, expanding this method to other data curated not only from our own data but also other researchers' available data is under consideration.Also introducing bioinspired algorithm [20] to facilitate the training speed and performance of Monte Carlo algorithm.And a comparison between our approach and the state of art of deep learning methods will also be undertaken.

EAI 3 A
Endorsed Transactions on Scalable Information Systems 12 2015 -08 2016 | Volume 3 | Issue 8 | e5 Local Field Correlated and Monte Carlo Based Shallow Neural Network Model for Nonlinear Time Series Prediction

Fig. 12 :
Fig. 12: Result under 500 hidden units and x function