Linear Regression Model for Estimating Sustainable Generation: A Case Study in Tamil Nadu

This article aims at developing a statistical model for the prediction of DC and AC generated power from the installed PV plant. A proper understanding of the PV plant characteristics is highly in need of predicting the yield based on the solar and atmospheric parameters. This study focusses on investigating the relationship among the factors such as beam and diffused solar radiations, atmospheric temperature and wind speed for predicting the hourly generated powers. The location involved in the investigation is Chennai city, Tamil Nadu state, India. The meteorological data for the selected location is obtained from NREL and using a simple linear regression model prediction equations for DC and AC solar output power was built using Minitab 16.2.1 version. The methodology used has a capability of better correlation coefficient than the other techniques. The developed regression models show R2 value of 99.24% and 99% for DC and AC power and the predicted R2 (Rpred) values obtained are 86.54% and 83.22% for DC and AC power respectively.


Introduction
Recently, the demand for energy increases rapidly as the usage of electrical appliances in day today life increases. Hence the burden on dependency of fossil fuel also increases. The depletion in fossil fuel has a greater negative impact on environment. However the drastic technology developments on sustainable sources such as solar, wind etc. [1,2]. Out of those solar based power plants are attracting the researchers whose energy is eco-friendly, clean, unlimited and free when compared to other conventional fuels [3].
India being a developing country is much investing on solar energy systems for meeting the energy demands [4]. It is noted that, in India, a particular state called Tamil Nadu alone possess a total PV plant capacity of 4000 MW by the year 2020. It is also to know that the installed plant had a solar generated power for the period of 2019-2020 was 3842 million number of units [5]. The above data reveals the solar availability in Tamil Nadu and its usefulness for the yield of electrical energy for the current energy demand situation.
Hence the prediction of those yield in a specific location can help the investors for a proper designing and installation at accurate capacity of solar plants. The two different prediction techniques in literature is historical data based or model based methods [6,7]. Among those regression analysis, artificial neural network, support vector machines and other types of machine learning techniques are most attracted and accurate prediction models used for the prediction of solar panel generated powers in the literature [8 -10].
Based on the land cover area of India, the total potential of the country is found to be 5000 trillion kWh/year. It is estimated that a single square metre of solar panel area can generate maximum of 4-6 kWh/day. National Institute of Solar Energy (NISE) suggested minimum 3% of total waste land in India can able to produce a maximum power of 750 GW [11].
Beam and diffused solar radiation are the two main elements of solar radiation. The addition of beam and diffused radiation is global solar radiation [12]. This article uses the hourly meteorological data obtained for the selected location. The hottest city in Tamil Nadu is Chennai which is selected as a location for prediction. The latitude and longitude of the selected city are 13.08° N, 80.27° E respectively. Satellite data, Situ measurements and statistical methods are the techniques utilized for collecting information about solar radiations. Buying and installing measuring instruments such as pyrometer, sunshine recorder etc., under situ measurement technique in the location under study for all cases are very expensive and challenging. It is also tedious for proper maintenance and servicing of those instruments. On the other hand, the information obtained from satellite is accurate and reliable however it highly depends on the efficient communication from the satellite stations.
Statistical method called regression modelling overcomes the above difficulties where the available historical data is utilized for generating a set of prediction equations by fitting those empirical data. Linear Regression is a relatively simple technique that may be carried out quickly and reliably. Additionally, these models may be trained quickly and effectively on systems with less computing capacity than other sophisticated methods. When compared to some of the other machine learning methods, linear regression has a much reduced time complexity. Linear regression's mathematical formulae are also very simple to comprehend and interpret. The prediction equations may be of yearly, monthly, daily or hourly as per the available data [13 -15]. However similar studies on the prediction of generated power from the have been found limited in India. Hence, the presented article focusses on the estimation of DC and AC generated power in a specific location in India which experiences non-seasonal weather conditions over the years.

Methodology
Hourly data of the selected locations were collected from National Renewable Energy Laboratory (NREL). Averagely Chennai is subjected to 9 hours of solar radiation per day. The correct location as well the detailed parameters for obtaining the historical data is given in Figure 1 and Table 1 respectively. The DC and AC hourly generated powers along with direct beam and diffused radiation, ambient temperature and wind velocity of the selected location is gathered.
The prediction of hourly generated power developed for Chennai is made using regression analysis (RA). It compares the characteristics of the model developed to that of the observed information. This model utilizes a single variable 'u' which is independent which possess a linear relationship with an output variable 'v'. Equation (1) denotes the linear regression model.
Here, intercept λo and slope λi represents the constants (unknown) which are found using meteorological data and the random error is denotes as ε. The best fit in align with the straight line is termed as sum of squared residuals. The sample of data received from NREL for the 1st January is given in Table 2.

Regression Model for DC and AC Power
Based on the average hourly data for the site Chennai, Tamil Nadu, India, the regression model for generation of DC and AC power outputs are obtained.
Where, B is Beam Irradiance (W/m^2), D is Diffuse Irradiance (W/m^2), T is Ambient Temperature (C) and S is Wind Speed (m/s). PDC and PAC are the DC and AC power generated in Watt respectively. The co-efficient of beam, diffuse irradiance, temperature and wind speed are provided in Table 3.
A graphical way of analysing residuals are most important to check for a 'good fit' regression model. A residual value is a measure of how much a regression line vertically misses a data point. Regression lines are the best fit of a set of data. The residual plots provide the information about the correlation among the trial run experiments and the predicted values. The residual plots of DC and AC are depicted in Figures 2 and 3 respectively. It is observed that the points are spread linearly straight as shown in Figures 2 (a) and 3 (a). It denotes a better correlation among the predicted and trial run (experiment) values. It is termed as a normal probability plot. Residual values and predicted values are compared in Figures 2 (b) and 3 (b). Both seem to be closest to each other hence a very minimal difference is observed between them.
Figures 2 (c) and 3 (c) represent the histogram plots of DC and AC power. The clear statistics about the residuals are displayed in histogram graphs. The residuals versus order of trial runs are shown in Figures 2 (d) and 3 (d). The residual values are present in both positive and negative which indicates the presence of certain correlations. As a result of the complete analysis of DC and AC residual plots, the models are promising for adequacy. The created regression model possesses a better R2 (Coefficient Determination) value which denotes sufficient accuracy. The R2 values obtained are 99.24% and 99% for DC and AC power respectively. The adjusted R2 (Radj) values are 96.95% and 95.98% for DC and AC power respectively which implies that the developed regression model has high importance. Also, the predicted R2 (Rpred) values obtained are 86.54% and 83.22% for DC and AC power respectively.
The effect of irradiance and temperature on the DC generated power is shown in Figure 4. In figure 4 (a), maximum array output is noticed at maximum beam irradiance and temperature median.
Further increase in temperature may lead to decrease in output. Figure 4 (b) reveals that high array output at median of diffused radiation and temperature. It is observed that for more output yield from the installed plant, the temperature must be in median rather than low or high.
The effect of irradiance and wind speed on the DC generated power is shown in Figure 5. The plot in figure 5 (a) reveals the interaction between the beam irradiance and wind speed effect on DC output. Here, for the case of high DC output, high beam irradiance and medium wind speed is required. Similarly, in the case of figure 5 (b), the maximum output is obtained at the median of the plot. Hence it is concluded that for the output yield to be high, maximum beam irradiance, medium diffused radiation, temperature and wind speed are found to be the optimum case. The presented work is compared with the previous work done in the similar area and the comparative table is listed in the Table 4.

Conclusion
The scarcity in fossil fuel and rise in environmental pollution, leads to the development of renewable based energy sources. A proper prediction of solar power from the PV panels are highly in need of today's scenario for a particular location. Hence this article can provide an insight idea to the researchers to predict the DC and AC power generated from the installed power plant in Chennai, Tamil Nadu, India. The prediction of DC and AC output power based on the beam radiation, diffused radiation, atmospheric temperature and wind speed has been carried out based on the meteorological data from NREL. Linear regression model based prediction equations are obtained using a software tool called Minitab 16.2.1. The developed regression model owns a better R2 and Rpred value which denotes the model is best fit with high accuracy.