Day-ahead electricity price forecasting model based on artificial neural networks for energy markets

Day-ahead electricity price forecasting is still an open problem in electricity markets. One major method is used in solving this problem is artificial neural networks (ANN). But they are usually trained slowly and need large numbers of patterns. NN trained using Levenberg-Marquardt (LM) learning is proposed and partial autocorrelation is applied on time series data to get correct input values. The functionality of the NN-LM is higher than the traditional ANN and some other hybrid approaches. To show the effectiveness and accuracy of the NN-LM method, the Indian and the Austrian energy exchange markets are considered. It is significant to note that for the very first time, the NN-LM based approach is being tested on both the energy markets. Finally, the flexibility of the proposed approach is checked using a 4-fold cross-validation technique. The 4-fold cross-validation strategy is capable of improving the generalization ability of the model and accomplishing higher forecast precision.


Introduction
After the 80s, numerous nations have changed the economics of their electricity markets from monopolies to oligopolies in an effort to increase competition. A significant feature of this change is to allow competition among generators and create market conditions in the industry which are necessary to decrease the cost of energy production and distribution, eliminate certain inefficiencies and increase customer choices [1]. The meaning of deregulation is the reduction or elimination of government control over a particular industry. The purpose of deregulation is to promote more competition within the same industries in the same geographical jurisdiction. It is generally believed that a fewer and simpler regulation will lead to a raised level of competitiveness and the overall result will be higher productivity and more efficiency at lower price. [1].
Deregulation of electrical markets calls for the restructuring of the electricity industry. The traditional vertically integrated system is broke down into three separate businesses are generation, transmission and distribution company. These three businesses are operated by three different entities. Deregulators advocate that deregulated electric market will bring cheaper electricity and in the meantime provide more choices for the customers [3]. In a deregulated market, instead of one generation provider, there are several generation providers in a local area. The local regulatory body can no longer fix the electricity price. The consumers have a choice regarding their local electricity providers. They can choose different electricity providers depending on their requirements and demand.
The deregulation of power industry creates new challenges in the electricity market due to forecasting of price which has become a major issue globally. The price is volatile in nature. From an economic point of view, electricity is non-storable goods which make balance between demands and supply a herculean task. The electricity prices in competitive markets are directly or indirectly affected by a number of factors which are interlinked to each other. Uncertainty factors like load, weather, market forces, bidding strategy etc arc fluctuating, and hence prediction of price is difficult. Accurate forecasting of price is not a trivial task. The electricity price forecasting is important for all market players and its behaviour is different from other commodities. Through competition, market deregulation strives to improve generation availability and efficiency.
Electricity price prediction is more complex than load forecasting because of uncertainties in operation as well as bidding strategies of market participants. The volatility and non-linearity of the system directly affects the accuracy of price prediction. Significance of price prediction and its complexity have motivated the researchers to be more innovative and propose numerous strategies. Among these methodologies are the time series and artificial neural network (ANN) models [3,4].
Time series models, for example, dynamic regression and transfer function, autoregressive integrated moving average (ARIMA), generalized auto-regressive conditional heteroskedastic and hybrid methods based on wavelet transform (WT) and ARIMA (WT-ARIMA) have been proposed in the literature. Time series models are linear predictors, and they experience issues in foreseeing non-linear behaviour of electricity price. Hence ANN is utilized to solve this problem. The advantage of ANN is their non-linear modelling ability and ability to capture the high volatility prices [5]. When the supplier offers a price equal to or below market clearing price (MCP), it is set to that price at that hour. In deregulated power market based pools, purchaser organizations submit bids for selling and purchasing power for the next 24hr time period. In electricity price forecasting, ANN takes previous day's prices as input factors. Accuracy in forecasting MCP relies upon natural and extraneous components [5].
ANN with modified Levenberg-Marquardt (LM) learning algorithm is implemented in Penn-Jersey-Maryland (PJM) market. The approach using ANN predicts the 24hr locational marginal price (LMP) of dayahead energy market [6]. The results obtained are compared with dynamic regression, transfer function, ARIMA, WT, simple application of neural networks (NN). Also fuzzy c-mean method is used to classify three clusters. NN are used to forecast MCP for day-ahead energy market. Three layered back-propagation (BP) network was chosen for structure of NN. The result showed 16% error on week days and less than 20% error on a week end. The accuracy can be improved by combining several techniques such as fuzzy logic, NN and dynamic clustering together. NN in open market was presented [7] by researchers.
Among the existing approaches, [19] proposes to develop a hybrid ANN forecast engine solely to Indian energy exchange, since only meagre work is carried out in this market. Hybrid ANN models, which combine heuristic search algorithms, such as ANN-ANN-PSO, Wavelet-based ANN and Wavelet-based ANN-ANN-PSO were developed to forecast MCP [19]. In the process of training ANN, the weights were updated based on conventional gradient descent method. It should be noted that the weights can be updated either in incremental or in batch modes [19]. However, the conventional gradient descent method [19] had an obvious drawback of getting stuck in local minima.
The proposed research work in this paper develops a NN trained electricity price forecasting model using LM (NN-LM) network and is used for forecasting the MCP. The proposed work uses a novel approach to eliminate the drawbacks mentioned in the earlier paragraphs and is implemented on two test systems, one on an Indian market and the other on an Austrian energy market to compare the forecast results. The findings show the accuracy and efficacy of the proposed approach. The legitimacy and versatility of the proposed approach are verified by comparing the obtained results with that from 4-fold cross-validation.
The rest of the paper is organized as follows. The methodology and detailed NN-LM learning are presented in Section 2. The experimental results and discussion are provided in Section 3. The conclusion is described in Section 4.

Methodology
This section of the research describes the data source, input feature selection for NN-LM model, NN-LM model for day-ahead price forecasting in the electricity markets of Indian as well as Austrian energy markets and their prediction performance evaluation.

Data source
In this paper, the data for electricity prices data are taken from the daily trading reports of Indian as well as Austrian energy markets and are presented on a monthly basis. The Indian and Austrian dataset consists of MCP [20,21].

Input feature selection using correlation
Choosing the most suitable inputs to a model is the imperative initial phase in model building. It is particularly critical for NNs that are intense, non-linear processors. For time-series, inputs additionally incorporate lags (memory length). As the input dimensionality expands, multifaceted nature of the model increments and learning turns out to be more troublesome, prompting poor convergence. With fewer applicable Day-ahead electricity price forecasting model based on artificial neural networks for energy markets 3 sources of inputs, a network can concentrate on building up the required associations with more efficiency. The test is to choose from all the potential data sources, a subset of information sources that will prompt an unrivalled model. If there are several inputs in the time-series, then it is necessary to find the appropriate lags that are significant to the output for each time-series [22].
The electricity price information of the earlier day (n th day) is mapped with the following day (n+1) th day while modelling neural systems. The reason is that the strength of the correlation of both linear and nonlinear parameters between n th and (n+1) th day is more stronger [19]. For instance, the MCP profile on Monday of the earlier day is mapped to Tuesday of the following day. So when a test contribution of the n th day is fed into the prediction model, the MCP of the (n+1) th MCP is predicted. However, it should be noted that researchers discussed short-term electricity price forecasting in which the electricity price or MCP is forecasted for a day or a week [19].
The correlation gives the level of direct relationship between two variables, which involves how firmly the two variables are identified with each other [23]. The extent to which the variables are related can be determined by deciding the correlation coefficient, whose value is limited between −1 and 1. The three potential outcomes of being positively correlated, negatively correlated and not correlated correspond to the correlation coefficients with values close to 1, −1, and 0, respectively.
Correlation is the most prevalent investigative method for choosing inputs and the quantity of lags. Correlation examination has been utilized by a few researchers [22][23][24] for input determination in electricity price forecasting.
The The inputs for the proposed models depend on this correlation investigation. The chosen inputs (lagged prices) demonstrate the impact of short-run trend, every day periodicity and week after week periodicity. There are different inputs for the proposed prediction model in Indian and Austrian power markets during all of the test periods.

Neural network trained using Levenberg-Marquardt learning
ANN is made up of neurons organized in layers, as illustrated in Figure 1. The data is fed into the network through an input layer. This is followed by setting up at least one intermediate (hidden) layer. The output data comes out of the network's last layer [4]. The transfer function contained in the individual layers can be nearly anything. It describes the mathematics behind the NN with LM training algorithm. When an input vector is presented to the NN, the output error can be computed by a squared error. The squared error is calculated as the sum of the squared differences between the target values and the output values. . (1) where ( ) n t k is the target output for the k th in the output layer when pattern n is presented and ( ) n y k is the net output for the k th in the output layer when pattern n is presented. The output error for all input vectors presented to the feed-forward is given by ( The goal of training algorithm is to iteratively adjust the weights in the network to generate the preferred output by minimizing the output error. BP is a gradientdescent approach in that it utilized the minimization of first-order derivatives to locate an ideal solution. It works with a training set of input vectors f, and target output vectors t. The training algorithm iteratively tries to constrain the created outputs described by vector y to sought after target vector t, by modifying the weights in the network through a training calculation.
Quasi-Newton methods are popular algorithms for nonlinear optimization. They use second-order derivatives to find the optimal solution, so they generally converge faster than the first-order techniques such as the gradientdescent method used in BP [25]. Quasi-Newton methods can be used to train NNs, and they can be used in most configurations that work for BP [26]. The second-order partial derivatives are computed in a Hessian matrix, H. The weight update is the product of the inverse Hessian matrix H, and the direction of the steepest descent, g. Since it works on the average gradient of the error surface, a batch update of weights is performed at the end of each epoch [27,28].
Since determining the weight updates involves the use of a Hessian matrix with all the second-order derivatives, the computation is difficult and time consuming. By using approximations to the Hessian matrix, speed can be increased. In general, Quasi-Newton techniques can become stuck in local minima more often than the other optimization techniques [27,28].
The LM algorithm is nonlinear optimization based on the use of second-order derivatives [25][26][27][28]. It has been adapted for use on training NNs. The main weakness of the LM algorithm is that it desires the storage of several matrices that can be quite large for definite problems. It also works only with summed squared error functions, so it is often used for estimation (i.e., regression) applications.
The LM algorithm is a succession of the features of gradient descent found in BP and the Newton method [27,28]. It assumes that the underlying function being modelled is linear and that the minimum error can be found in one step. It calculates weight change to make this single step. It tests the network with these new weights to determine whether the new error is lower. A change in weights is only accepted if it improves the error. When the error decreases, the weight change is accepted and the linear assumption is reinforced by decreasing a control parameter, µ. When the error increases, the weight change is rejected and like BP, it follows a gradient descent by increasing the control parameter to de-emphasize the linear assumption. Along these lines, the LM calculation is a bargain between a Newton and gradient-decent process [25][26][27][28]. Close to a base, the linear supposition is roughly genuine so the LM calculation gains exceptionally fast ground by utilizing this second-order Newton-like feature. The procedure is repeated until the stage when the desired error is reached or maximum number of iterations is reached.
The LM calculation approximates the Hessian matrix utilized as a part of the Quasi-Newton technique as the result of a Jacobian matrix of the first-order partial derivatives, with its transpose as appeared in Eq. (4). Since it utilizes a Jacobian matrix J, rather than Hessian matrix H, the estimation is simpler [28].
(4) The gradient is calculated as the result of the Jacobian containing the first-order partial derivatives and a vector e that contains the errors being minimized.
(5) This gives us a weight-update formulation in Eq. (6), where I is the identity matrix and µ is the control parameter.
From Eq. (6), it is revealed that µ is 0, and this is a Newton routine with an approximated Hessian matrix.
The larger values of µ make it look more like a gradientdescent method.
The LM training procedure is follows: (i) Initialize weights. Set ) ( W ih n and ) ( W ho n to small random values, (ii) Present each pattern to the input of the network, (iii) Propagate data forward and produce the output pattern. Determine the error between the target output and the actual output, (iv) If there are more patterns (i.e., n < N) in the training set, loop back to step (ii), (v) Now estimate the error vector e, between the target and actual output for all patterns presented by using summed squared error as in Eq. (1), (vi) Calculate the Jacobian matrix, J, from the first-order partial derivatives, (vii) Calculate the weight update as given in Eq. (ix) If the norm of the gradient g, is less than the preferred amount, stop; otherwise loop back to step (i).

Prediction performance evolution
The input features and the target output (actual electricity price) are linearly normalized in the range of {-1, 1} before being presented to the NN-LM model and the output from the NN-LM model was de-normalized before being presented in performance evaluation. The performance of the trained network is then evaluated by comparison of the network output with its actual value via statistical evaluation indices. The mean absolute percentage error (MAPE), the normalized mean square error (NMSE) and the error variance (EV) are used to evaluate the performance of forecasting in electricity prices.
The MAPE can be defined as where Ah and Fh are the actual and forecasted electricity prices of h th hour, respectively, and N is the number of forecasted hours.
The NMSE is given by where where is average of actual ata. The EV is given by The MAPE, NMSE and EV were used in the experimental results in this case study. If a model has smaller MAPE, NMSE, and EV, then that means that it is well performing both in space and in times as well as more precise will be the prediction of prices. The detailed discussion on three error indices are presented here [19].

Numerical results
This section presents the case study of energy exchanges in Indian and Austrian electricity markets which were forecasted by the proposed NN-LM model.

Case studies
The day-ahead electricity market of the Indian energy exchange and energy exchange of Austria are considered in this real-world case study.
In the energy market of an Indian market, price changes are identified by key behaviour of the dominating player, which are difficult to predict. It could be observed that the series introduced in Figs. 2-5 (four weeks of June, September and two weeks of October) have shaky mean and variance. This temperamental conduct makes forecasting hard. Along these lines, it could be obviously observed that the Indian power market is a genuine case study with adequate unpredictability. Hence, researchers have used the Indian electricity market as a benchmark case study [19].  Therefore, day-ahead energy exchange Indian market, during the year 2014 is utilized as a case study in price forecasting [19]. For correlation, four weeks are chosen, i.e., weeks with especially good price behaviour were purposely not picked. The most unstable prices were utilized for forecasting [9]. It is significant to note that for the very first time, an NN-LM based approach is being tested on the Indian energy market price data To construct the forecasting model for each of the forecasted weeks, the input data incorporates hourly historical prices of the 42 days prior to the day of the week whose prices are to be predicted. Large training sets are not used to stay away from overtraining amid the learning procedure and when training is attempted for over 42 days, it doesn't give viable and better forecasting accuracy [4,5,9]. Also it is tedious for training the NN-LM beyond the 42 days worth of data. Hence, 42 days worth of data set is utilized. The previous 42 days electricity price data are used for training and the following 7 days electricity price are predicted.
For the Indian market the testing data are the week in June

Electricity price forecasting with NN-LM model
In NN-LM, the architecture of the neural network is determined using stochastic approach. More and more number of simulations were made until the best number of hidden layers, and their corresponding number of neurons were obtained [4,8,9]. The network architecture is typically decided on when both training and testing gives minimal MAPE.
The resultant number of neurons in the input, hidden, and output layers for testing weeks of the Indian and Austria energy exchange markets that produced minimal MAPE error in both training and testing are shown in Table 1.

Comparison with other approaches
Distinctive methodologies are tested for Indian and Austrian energy markets and results of these investigations are discussed in this section. Table 2 shows the statistical analysis and metrics used to assess the accuracy of the proposed NN-LM display in forecasting the electricity prices in both Indian and Austria energy markets. The first column shows the deregulated power market, the second demonstrates the forecast week, the third demonstrates MAPE, the fourth presents the NMSE, and the fifth column shows the EV. It is observed that the MAPE for the Indian power market in the year 2014 has an average estimation of 6.3021% obtained by utilizing the proposed NN-LM model. From Table 2, an average weekly MAPE which is close to 6.3406% for the Indian power market for the year 2019, and Austria for the four weeks of the year 2015 is reported, yielding an average weekly MAPE which is close to 6.1627%, results being obtained by using NN-LM model. It shows the effectiveness of the proposed model for the recent year and under other market environments.  Table 3, demonstrates the comparison between the NN-LM model and four different models (ANN, ANN-ANN-PSO, Wavelet-based ANN and Wavelet-based ANN-ANN-PSO). The proposed NN-LM is a single compact and robust architecture (without hybridizing the different hard and soft computing models) tool that consistently performs better than the other models. The four different models with the exception of ANN are a hybrid of soft computing models. As observed from Table 3, the NN-LM has excellent forecast accuracy with less calculation time, making it a single compact and robust model better than hybrid approaches such as ANN, ANN-ANN-PSO, Wavelet-based ANN and Wavelet-based ANN-ANN-PSO. From Table 3, it is also observed that with very low NMSE, and smaller the EV, the proposed model is wellperforming both in terms of accuracy and time, and more precise are the prediction of prices.
The aggregate setup time of the proposed technique including the execution of pre-processing (normalization), training of NN (tweaking utilizing experimentation approach), testing of NN and post processing (denormalization) was around 3 secs on an AMD processor with 2 GHz and 1 GB RAM memory. In the wake of training, average computation (response) time of the NN-LM was around 15 ms (since it just includes the forward propagation of the NN). This approach is an efficient and accurate method for forecasting electricity prices in a deregulated power market. Consequently, the NN-LM presents the best combination of forecasting accuracy and computation time, and furthermore brings down modelling complexity, which is essential for real-time applications. In a deregulated power market, the faster and accurate forecast of prices is likewise essential for real-life applications.

Effect of 4-fold cross-validation on forecasting accuracy
The NN-LM network is taken as an example to illustrate the effectiveness of cross-validation in the proposed training model. In this research, the 4-fold crossvalidation is applied to the Indian energy market for the year 2014 for training and testing. The training and testing sets are divided into four disjoint sets of equal sizes. Let each set of data be labelled as week 1 (w1), week 2 (w2), week 3 (w3) and week 4 (w4). The group 1 represents cross-validation w2w3w4 (training) Vs w1 (testing), group 2 represents cross-validation w1w3w4 Vs w2, group 3 represents cross-validation w1w2w4 Vs w3 and finally group 4 represents cross-validation w1w2w3 Vs w4. The statistical results of 4-fold cross-validation are presented in Table 4. Table 4 shows that 4-fold cross-validation has the most noteworthy accuracy, yielding an average weekly MAPE which is near 5.7717%. Also, the forecast accuracy of week 4 is lower than that of NN-LM, and week 1, week 2, and week3 have higher forecast precision than NN-LM, however, it increases the processing time of the model. Besides, we determined that the NMSE and EV of the test system took a long time by 4-fold cross-validation, and the results are shown in the Table 4. This phenomenon 9 shows that the 4-fold cross-validation strategy can improve the generalization ability of the model.

Conclusion
An exact determination of electricity price is an essential issue of concern for all energy market players and stock holders, either for creating bidding systems or for settling on investment choices. In the past research, no single accessible model has been applied across data from a broad spectrum of power markets. There is a need to put forth more research attempts in different markets also; this will help in interpreting and understanding the price development and behaviour in various power markets from an advanced point of view. Hence, in this paper, input parameters were selected using correlation analysis of raw data, by removing redundant components and LM training method was implemented which helped the neural network to train better. Forecast results of the benchmark energy market of India for the four weeks of the year 2014 were analyzed, yielding an average weekly MAPE near 6.3021%, with a very low NMSE, and smaller EV. The NN-LM networks' implementation results illustrate that it has excellent forecasting accuracy than other forecast methodologies, such as ANN, ANN-ANN-PSO, Wavelet-based ANN, and Wavelet-based ANN-ANN-PSO. The technique is very straightforward and yields an average weekly MAPE close to 6.3406% for the Indian power market for the year 2019, and for Austrian energy market for the four weeks of the year 2015. The average weekly MAPE is close to 6.1627%. It shows the effectiveness of the proposed model for the current year and the under different market environments. The 4-fold cross-validation technique can improve the generalization ability of the proposed model. The error of the test weeks has decreased with average weekly MAPE which is near 5.7717%. However, it increases the processing time of the model and improves the accuracy of the test data. In a deregulated power market, the electricity price forecast model resulting in lower computation time and quicker forecast of prices is essential for real-life applications.