Comparative Analysis of Wind Speed Forecasting Using LSTM and SVM

The objective of this work is to present a comprehensive exploration of deep learning based wind forecasting model. The forecasting of speed of wind is called as the wind speed forecasting/prediction. It is basically done to achieve the better sustainability for power generation and production. The availability of wind energy in ample amount makes it quite comfortable to be utilized for various functionalities. In this research work the main aim is to forecast speed using LSTM including certain parameters and then comparative analysis is done using SVM. Both are machine learning approaches but have different functionalities in comparison to each other. This comparison is done to obtain the better technique which can be further applied on larger datasets to design a better, accurate, efficient forecasting model for speed of wind. The survey and implementation of both the techniques gave a clear idea about the utilisation of long short term memory for the better and enhanced wind speed forecasting. The forecasting is based on various atmospheric variables, and the data set is taken from the kaggle datsets which have numerous attributes but we have considered few of them only for the prediction purpose.


Introduction
The energy demand is increasing quite rapidly and in contrary to it energy resources which are available in less quantity. To overcome such problem this type of alternative resources are being kept in consideration to meet the need. Since wind comes under the renewable form of energy and is present in ample amount, therefore it can be utilized for the power production using speed of wind. The research work discussed here is based on data mining techniques; it is through data mining that certain types of techniques are used on the given datasets to find patterns or relationships for example K-Means classification [1]. Basically data mining is used to manipulate data sets composed of enormous amounts of information to draw vital characteristics from it. The most important reason of using data mining in prediction algorithms is due to the availability of feature of clustering (identifying the groups of data and processes which are quite alike and similar to each other in any of the ways, without the consideration of structure of data) and thus examining of patterns in data sets using data mining provides wider aspects in the pre processing of the available data, which helps in development of good and accurate forecasting models.
Wind speed estimation is defined as the wind speed forecasting. This is done to utilise the wind speed for power generation. The methods for forecasting speed of wind is described into four horizons i.e. very short, short, medium and long duration forecasting. All the forecasting models have different applications such as in load dispatching, reserves, operational security and optimizing cost etc. Forecasting is done to maintain the gap between power generation and power production.
In this research work the short term forecasting model is developed using LSTM and SVM Separately. Both the models are based on supervised learning. Short term forecasting models are good at load dispatching and certain other power related activities. Implementation of LSTM and SVM on same dataset results in different results, as Satyam Gangwar, Vikram Bali and Ajay Kumar 2 both models are quite different from each other. As LSTM (long short term memory) has the property of pattern remembrance for longer duration of time as compared to SVM (support vector machine) [2]. So it provides wider aspects in the forecasting models and also neural networkbased models pick the best function to improve LSTM's performance and pattern remembrance property over a longer period of time makes it a more reliable and effective mixed forecast model. The main objective of this work is to do the comparative analysis of both the models i.e. (LSTM & SVM) and select the most appropriate and efficient model in the field of prediction of speed of wind. Different models have different functionalities. This paper is structured as follows: related work, proposed methodology, results, and conclusion.

Related work
In this section various models by different authors are analysed for wind speed forecasting. Different models have different parameter and environment requirements. Basically the models are divided into certain horizons. Some of the models and work done by the various researchers in this area are given below: Mehmet Yesilbudak, Seref Sagiroglu, IIhami Colak [1] presented a very short term model in which K-nearest neighbour classification is implemented. The error rate is observed graphically and found that it is less accurate. Da liu, Dongxiao Niu ,Hui Wang, Leilei Fan [2] given a forecasting model based on SVM it provide a stable model as it removes certain fluctuations in prediction. The model is composed of WT (wavelet transform), SVM and genetic algorithms. Here wavelet transform is basically used for signal processing and DWT (discrete wavelet transform) for the proper analysis of data, SVM provides great help by removing the over fitting of data that exists in other machine learning approaches and lastly the use of GA provides an enhanced version by opting for the best in the available data. S.A. Pourmousavi Kani and M.M. Ardehali [3] proposed a technique based on ANN-MC model in which MAPE and uncertainties are reduced in the prophecy of wind speed and other weather related forecasting's. This hybrid method requires less number of variables and also no overtraining of data is observed which overcomes the limitation of single ANN model. Yu Jiang , Zhe Song and Andrew Kusiak [4] proposed an another technique based on Bayesian model which has certain values of wind and low computing efficiency while prediction, as these models basically are used where consumption of wind is low but speed is high, the model is applied over less number of parameters and to predict a single value result i.e. prediction is given by mean value or the most probabilistic value. A. Tronsco, S. Salcedo-sanz, C. Casanova-Mateo, J.C. Riquelmw, L.Priet [5] proposed an approach consisting of eight regression tree algorithms which has small computational time. The limitation of this regression method is the approximation of similar function and difficult to implement because it requires more number of parameters with large datasets, so it is very tough to obtain a proper wind speed forecasting model consisting of few parameters only. Gong li, Jing shi, Junyi Zhou [6] proposed the techniques based on Bayesian adaptive neural network model which uses back propagation and RBF(radial basis function) to predict results but the obtained model was not so consistent in prediction of wind speed all the times, as the requirement of proper environment is must, overall the result prediction was quite accurate. Hui liu, Xiwie Mi, Yanfei Li [7] proposed the techniques using smart deep learning model which uses WPD-CNNLSTM-CNN, the model has different type of feature extraction for wind speed forecasting i.e. WPD (wavelet packet decomposition) to generate sub layers by decomposing wind speed and CNNLSTM is employed over the high frequency sub layers to predict wind speed in 1D time series data. The model uses a three layer structure for better and efficient prediction useful in sudden wind changes. Erasmo Cadenas, Wildfrido Rivera [8] presented a methodology on ANN, the main requirement for this model is trained data as it is based on the ANN algorithm the main objective here is to train the available data as better as you can, better the trained data better is the result. In this methodology the two layer and three neurons were best for training and prediction. The main drawback of this model is the overtraining of the available data for the better prediction and thus it increases the complexity of the developed model. Erasmo Cadenas, Wilfrido Rivera [9] proposed certain approaches based on ARIMA-ANN model which is implemented with various different layers, input and output neurons and overall it uses four models, all the models shows different results with linear and non linear time series data used for the wind speed prediction. Jianzhou Wang, Shanshan Qin, Qingping Zhou [10] proposed a Hybrid model for three different sites, the result obtained here shows that hybridising of various different models provide a great enhancement in the accuracy rate of wind speed prediction. In this model support vector regression (SVR), kruskal wallis test ( K-W test), SIA and Employee Relations & Negotiations Network (ERNN) are used together to develop a unique model for the prediction purpose. But it is quite complex and tough to use. Thanasis G. Barbounis, John B. Theocharis [11] proposed a technique based on local RNN which includes three local RNN for prediction, it is quite complex but shows stability while implementation. Rajesh G. Kavasseri, Krithika Seetharaman [12] presented a model based on F-ARIMA model, the structure is simple but the requirement of huge data and its handling is quite tough. In this model comparison with the ARIMA model is done and this model is quite effective in terms of accuracy. But the main issue here is the handling of glare data sets. T.G. Barbounis, J.B. Theocharis [13] proposed a model based on local recurrence neural network which requires good training procedure to show higher adaptability, as the involvement of neural network in the model, the training procedure is mainly focused for better prophecy of wind speed. Here larger datasets are utilised more effectively for efficient result. Kumar Ajay et al. [14] presented the different methodologies for document clustering using K-nearest neighbour classification and EAI Endorsed Transactions on Scalable Information Systems 01 2020 -03 2020 | Volume 7 | Issue 25 | e1 Comparative analysis of wind speed forecasting using LSTM and SVM 3 Artificial Neural Networks. This paper gives the idea of different methodologies for classification and clustering of data. Kumar Ajay et al. [15] mentioned the various methodologies for electricity load forecasting using Artificial Neural Networks and Machine Learning. This paper describes the concepts of load prediction and dataset used for prediction. It gives the different approaches of forecasting with their advantages and disadvantages. Bali et.al. [16] discussed the use of optimization technique for Rock Predication by using Artificial Neural Networks. Bali et al. [17] mentioned the use of optimizing technique using goal programming approach. Bali et al. [18] suggested the use of optimizing technique for Optimal Component Selection. Fuzzy model has been formulated in the paper. Kumar Ajay et al. [19] reviewed the different methodologies for semantic similarity measures using semantic latent analysis. This paper gives the idea of different approaches of supervised and unsupervised learning for semantic similarity. Kaur, P et al. [20] proposed different supervised learning and nature-inspired computing techniques for the diagnosis of human psychological disorders. This paper gives the idea of various supervised learning approaches. Gautam [23] presented the various weather forecasting techniques using data mining approaches. This paper gives the idea of applying the data mining techniques and provides the summary of results in terms of their efficiency and accuracy.
The above mentioned models or the techniques which are implemented previously are predicting the wind speed in quite significant manner. Some models require smaller datasets while some requires larger datasets so the prediction models vary there results quite abruptly. The models which are based on Neural network require a properly trained data for the better and efficient prediction. Most of the models vary with the change in the parameters as not all the models can handle the variety of parameters together as they can only utilise a single or a few wind parameters. Also some models requires over training too but the task of overtraining is quite complex and also data handling is quite tough in such approaches. Hybridising of the various models together is often done to provide accurate results but using of various models together is a cumbersome task as it requires learning and knowledge of all the algorithms which are to be used for the prediction of wind speed. So it important for an individual to perform wind speed prediction in better and accurate manner. Therefore developing an easy and compact model is necessary; therefore model based on LSTM is taken under consideration for better utilization of parameters and data sets to produce accurate prediction model.

Proposed methodology
This paper highlights two methodologies i.e. LSTM & SVM, both the methodologies have quite alike elements but the main difference is of approaches used by both the techniques. In past several years, data mining and its implementation on various applications have earned more attention. The method of finding patterns in big data sets incorporating various techniques is termed as data mining. Data mining is done so as to obtain the knowledge based on various generated patterns [24]. Firstly in case of the data preprocessing the data is cleaned i.e. absent and incomplete records are removed from the database. Than in the further process the data is selected and transformed to extract the final dataset. Now on the available final data set, data mining processes are applied to generate the patterns which can be further evaluated and analyzed to discover knowledge. The method of knowledge discovery in various data sets includes many iterations, shifts of movement backward or forward. Some of the applications of data mining in case of wind energy method are forecasting of wind direction, power, speed, management of energy storage, helps in optimization of power, providing turbine faults and placement or controlling turbine installation [25,26].
The main phases in these proposed methodologies are raw data, pre processing, and data prediction. All the steps mentioned in the proposed methodologies must be followed in a right manner so as to develop a more reliable, efficient and accurate model for forecasting of wind speed. In both the methodologies firstly the raw data is collected and is observed carefully to check whether it require preprocessing or not i.e. the available data is ready to use without any changes, if not then certain preprocessing techniques are employed over the available dataset so as to make it relevant for our use. The data set which is available here has many parameters but our requirement for implementation of these methodologies requires only few parameters, so to achieve this, preprocessing is required in which data dimensionality is reduced. After the preprocessing is done, the data thus obtained is refined data. Now we can employ our prediction algorithms on this data so as to develop our own model to forecast the results for speed of wind. The exactness of both algorithms is measured in the form of different type of percentage errors.
The below mentioned Figure 1 is the system architecture of the basic methodology used by LSTM and SVM for wind speed forecasting: The description of each element in the system architecture is described as follows:

Raw Data:
The raw data is basically the unprocessed or we can say the data which is not completely ready to be used for any information extraction. It is also known as the source data which has not been gone through any processing technique whether manually or through any algorithm or any automated machine. The below mentioned is the primary data set which has been taken from kaggle data sets. The dataset which is available has following tupples (420552) with attributes (15). The data set given in Figure 2 is in the format of seconds, so the next step is to process the data and to do this the certain algorithms and processes are used to make our data relevant to use.

Preprocessing:
In the phase of preprocessing the main objective is to select the required data which include following attributes wind speed, temperature, pressure, relative humidity and solar radiation. The preprocessing here is carried out through visualization technique and training is done through the recurrent neural network. It is a must step in every proposed methodologies for the development of unique and more accurate model. Data processing is basically the data filtering so as to make it more convenient to use. In these proposed methodologies after the dataset is trained with the inclusion of following attributes is depicted below. The attributes given in Figure 3 are being used for the prediction. In this phase predictions are made out by the implementation of proper algorithms. In this paper the two algorithms are used for the data prediction i.e. LSTM (long short term memory) and other is SVM (support vector machine). Both algorithms depict different predictions. The results obtained are in the form of RMSE and MAPE values. These predictions are further utilized for data visualization.

Visualization:
Visualization is the process of analyzing of obtained result, in this phase the predicted wind data is visualized and the result obtained in the form of RMSE and MAPE properly analyzed to generate the predicted graphs, bar graphs and other necessary pictorial representations.

LSTM based methodology
In this type of methodology basically the prediction is based on the neural network techniques, result are obtained by the neural network mechanism. In the neural networks the connection between the nodes are in the form of directed cycles which are the recurrent neural network. The algorithm which is used for training purpose is Back propagation algorithm in which the parameters are shared in all the steps for better and efficient result.
In the proposed methodology the first task is to gather the available data for implementation, collection of this type of data is based on various parameters. Here parameters are the basic environmental parameters. In pre-processing phase second task is to reduce the dimensionality of the data through data visualisation technique. The main objective of EAI Endorsed Transactions on Scalable Information Systems 01 2020 -03 2020 | Volume 7 | Issue 25 | e1 data visualisation is to select the necessary parameters from the complete dataset. The new set of variables obtained are now trained using recurrent neural network algorithm (back propagation). After the training of data is completed the third task is to apply LSTM over the trained data to obtain the predictions and to generate the error rate (RMSE value). The working of LSTM is defined briefly below.
The back propagation is through time as the output gradient depends on the previous values other than the current values only. LSTM uses different functions in calculation of hidden states. LSTM contains a memory cell, having four main elements including an input gate, a neuron, a forget gate and an output gate. Interaction between environment and memory cells is achieved by the involvement of LSTM gate only. In order to calculate a gradient at 3 we need to back propagate 2 steps and sum the gradient value. The working of LSTM gates and layer interaction with each other is shown in Figure 4. The working of gates in a network, straight lines representing the closed gates, open ones are represented using blank circles. Forget gates are circles and lines shown just horizontal down the hidden layer represented using blank circles. Mapping can be done one to one, many to many by the neural network involvement in the LSTM. It is basically based on the time series prediction model, in the proposed methodology the input is given to network and the output thus obtained is treated as the input to the another state. The main objective of this methodology is the pattern remembrance for longer duration of time which is quite helpful in forecasting or the prediction approaches. The observed value is compared with the calculated value to find the error in prediction. The layer interaction is given below in Figure 5.

Figure 5. Layer interaction in LSTM
The error value is calculated, it is just the check of difference between estimated value and predicted value. The hidden layer supposes the weight along with the given input and the output value is calculated. Linear function is represented as a gate and different sets of weights filter the input. In LSTM process the output of one node is given as a input to the another state. The updating of new state takes place in the form mentioned in equation 1: (1) Where C : new state or updated state : function value C −1: old state : input value C ': candidate value The model proposed in neural network here uses LSTM to forecast wind speed for predicting purpose. The requirement of data is less and result is accurate.

SVM based methodology
SVM is a biased allocation which is defined and showed by hyper plane separation. An optimal hyper plane is given by the labelled trained data; the data is basically the supervised data. In the proposed methodology assemblage of available data for implementation, collection of data is done based on various different parameters. Here parameters are the basic environmental parameters. In pre-processing phase the data visualisation is done and is applied over the collected data to reduce the dimensionality of the collected data set and useful parameters are extracted. The new set of variables obtained is now trained using recurrent neural network algorithm (back propagation). After the training of data is completed than SVM and SVC is applied for the classification purpose is applied over the trained data to obtain the predictions and to generate the error rate (RMSE value). The working of SVM is defined briefly below. Basically the division of two classes in 2D or multidimensional is done by a hyper plane. This results in separation of two different classes is given in Figure 6.

Figure 6. Separation in less complex data
It is very difficult to achieve the division if the data is quite. So complete separation of two classes in the x-y plane is done by the transformation and addition of one more plane i.e. z -axis to divide the plane into classes. Here transformations are basically the kernels. These separations are made so as to divide similar data sets at particular plane as shown in Figure 7. In SVM, having a linear hyper-plane between these two classes is simple. Do we really need to manually add this function to have a hyper-plane? No, SVM has a kernel trick method. These are the processes that take poor-dimensional input space and turn it into a strongerdimensional space, i.e. transform non-separable case into separable aspect and these processes are kernels. It is mostly helpful in the problem of non-linear separation. Just put, it does some highly complicated information transformations, then figure out how to separate the information based on the labels or outputs you have described.

Figure 7. Separation in Complex Data
Prediction for the input vector (I new ) is mentioned below in equation 3 which is generated through the input vector (I q ) and each support vector (qi) by taking the dot product of both i.e. (2) here coefficients C(0) and ri estimated by the training data and thus the above mentioned equation imply calculation for the inner products of recent input vector (I q ) including all the support vectors in the available training dataset.
Linear SVC (Support Vector Classifier) main objective is to return the best fit hyper plane in division of the classes. Using certain other features of python we can predict the values quite easily like matplotlib etc for data visualisation.

Results
Both the methodologies are applied on the same data set "weather archive jena 2009-2016"(420528 tupples) from kaggle datasets which includes attributes in terms of seconds but the after conversion it into hourly basis the overall tupples that are trained becomes 1751. The predicted result is quite different in terms RMSE value. The error rate in LSTM is 0.427 while in case of SVM it is 0.768. Figure 8 shows the graphs in that are obtained while the model implementation.

Figure 8. Visualized Parameters
These are the visualized parameters i.e. the parameters that were taken under consideration while the implementation of both the methodologies it includes parameters (wind speed, pressure, relative humidity, temperature, solar radiation).
EAI Endorsed Transactions on Scalable Information Systems 01 2020 -03 2020 | Volume 7 | Issue 25 | e1 Figure 9. Training and Testing Loss Figure 9 given above shows the graph depicting the training and testing loss during the training of data done at implementation phase of these methodologies. The followed graph shows the actual speed of wind and predicted one in case of LSTM methodology. At the time of training and testing, certain type of losses occurs, which is due to the uneven occurrence of data. These losses occur due to the glitches in the algorithms.
In the below graphs given in Figure 10 which represents LSTM wind speed forecasting. The dataset is same for both the methodologies so they depict a quite alike graph from each other but different in error rate. These graphs show the uneven vibrations in the speed i.e. in real speed of wind and predicted.   . The performance graph is generated by the results which are produced by the both the techniques employed on same data set. The performance parameters for both the algorithms are shown in terms of RMSE value. RMSE for LSTM is 0.427 and RMSE value for SVM is 0.768. It is clearly observed that the error rate is high in SVM as compared to LSTM algorithm. The error rate in SVM is high as it does not hold a property of pattern remembrance for longer durations of time in comparison to LSTM. Both the techniques are quite contrary to each other as SVM works on the theme of hyper planes but LSTM works on the neural network criteria. SVM can be treated as the feed forwarding type network but LSTM can back propagate with time or as required by the system. So, due to these practices performance rate of LSTM based methodologies are higher than the methodologies comprising of SVM. It is clearly observed by looking the performance graph comprising LSTM to determine speed (wind) is better than the model using SVM technique. Here x-axis shows the error rate and y-axis denotes the algorithm used (LSTM and SVM). The RMSE value is calculated by the formula given in equation 4 and equation 5:

Conclusion
After the detailed observation of the results from both the methodologies, it can be concluded that the LSTM is more effective as compared to the SVM. The error rate in LSTM is less therefore it can be used more frequently in the forecasting techniques as compared to the other one. LSTM with deep learning can be implemented to obtain more efficient result in the forecasting approaches because of its property of pattern remembrance for longer duration of time. LSTM can be hybridised with other models to generate more accurate models with efficient prediction. After the overall study of both the methodologies the obtained result concludes that LSTM has more significance in forecasting techniques. Thus LSTM with the property of pattern remembrance can be further implemented on larger data sets to obtain the highly accurate results and can be utilised by organisations to predict the better and efficient weather forecasting conditions. In case of wind speed predictions it can be used to maintain the gap between power generation and power utilization.