A computing method of predictive value based on fitting function in linear model

Linear models are common prediction models in collaborative computing, which mainly generates fitting function to express the relationship between feature vectors and predictive value. In the process of computing the predictive value according to the fitting function and feature vector, this paper mainly conducted the following researches. Firstly, this paper defines a change interval of predictive value according to training set. Secondly, in this paper, the change interval of predictive value corresponding to feature vector in test set is computed. Finally, according to distribution of training set in the changing interval, the predictive values corresponding to feature vectors in test set are computed. Standard data sets are used in experiment, and MAE(Mean Absolute Error) and RMSE(Root Mean Square Error) are used to evaluate the prediction results. The experimental results show that the method proposed in this paper can improve the prediction error to a certain extent. Received on 07 June 2020; accepted on 23 September 2020; published on 02 October 2020


Introduction
Prediction models in collaborative computing include linear models and non-linear models. Non-linear models include RF(Random Forest), NN(Neural Network), GBDT(Gradient Boosting Decision Tree), XGBoost(eXtreme Gradient Boosting) and so on. Linear models include linear regression, Ridge Regression, Lasso Regression and so on. The main advantage of linear models is that they are easy to model. The most important technology of linear model is linear fitting. In the process of prediction based on linear model, the fitting function generated according to the training set is called linear fitting. The process of linear fitting is as follows: a loss function is defined by training data, and some parameters of the fitting function are obtained when the loss function takes the minimum value. The parameters characterize the quantitative relationship between the feature vector and the predictive value, so the predictive value corresponding to feature vector of test data can be * Corresponding author. Email: scnuzhonghao@foxmail.com computed according to these parameters. According to the error between predictive value and true value of test data, the quality of predictive result is evaluated.
The fitting function of linear model reflects the relationship between feature vector and predictive value. Because the modeling of linear models is simple and intuitive, Linear Regression, Ridge Regression, Lasso Regression and other linear models have many application in collaborative computing of social networks. For example, Wang et al. [1] applied linear regression to detection of Alzheimer's disease. Hang et al. [2] applied Ridge Regression to analysis of remote sensing data. Al-Obeidat et al. [3] applied Ridge Regression and Lasso Regression to temperature predictions in buildings. Topuz et al. [4] applied Elastic-Net to predict the number of pediatric outpatients. Li et al. [5] applied the Orthogonal Matching Pursuit to dynamic gesture recognition of radar sensors. Groen et al. [6] applied Bayesian Ridge to macroeconomic data analysis.
The above research work introduces various applications of linear models. Generally, linear models can be used in fields or directions that need data analysis, such 1 EAI Endorsed Transactions on Collaborative Computing Online First

Research Article
as Underwater Wireless Sensors Network [7], Underwater Routing [8] and Underwater Resurrection Routing Synergy [9]. The purpose of this paper is to optimize linear models and reduce predictive errors. Based on the parameters of fitting function, this paper proposes a method for computing a new predictive value, which can improve the accuracy of predictive result. The main contributions are as follows: • Based on fitting function in linear model, a change interval of predictive value is defined. The change interval represents the potential change range of predictive value, which can better express the relationship between feature vector and predictive value.
• Based on the change interval of predictive value, a new computing method of predictive value is proposed. Experiments show that, new predictive value is closer to the real value than original prediction value.
The remainder of the paper is organized as follows. Section 2 introduces related work. Section 3 describes computing method briefly. Section 4 describes the process of computing method in detail. Section 5 describes the experimental results on the standard dataset. Section 6 is the summary of the paper.

Related Works
Linear Regression is the basis of all linear models. For the optimization of Linear Regression, the main direction is to optimize the solution process of loss function and improve convergence speed. The methods used mainly include Least Square Method, Batch Gradient Descent [10], Random Gradient Descent [11] or Mini-Batch Gradient Descent [12]. In addition, Zhang et al. [13] proposed a method of feature weighting, by giving important features a greater weight and minor features a relatively smaller weight, the fitted curve is closer to data distribution. Wang et al. [14] considered that high-dimensional features are more suitable for regression models, and proposed combining GBDT with regression models. He used GBDT to improve the dimensionality of input features and prediction accuracy.
By transforming loss function of Linear Regression, various linear models are generated and applied in different scenarios. For example, for the over-fitting problem in Linear Regression, Ridge Regression [15] added regularization term of L2 norm to loss function, which reduces regression coefficient of sparse features and improves stability of Linear Regression. Aiming at the problem these are large number of sparse features in high-dimensional data, Lasso Regression [16] added regularization term of L1 norm to loss function. It can delete some features which has small regression coefficient, reduce the dimension of features, and improve generalization ability of Linear Regression. Elastic-Net [17] combines the advantages of Ridge Regression and Lasso Regression, and adds a regularization term to loss function. By setting parameters to balance the proportion of the L1 regularization term and the L2 regularization term, the Elastic-Net results tend to Ridge Regression or Lasso Regression as the parameters change. Elastic-Net improves the stability of Linear Regression while improving the generalization ability of Linear Regression to some extent. Orthogonal Matching Pursuit [18] adds a restriction term to loss function, which can limit the maximum number of non-zero elements in the regression coefficient. Under a specified number of non-zero elements, it tends to get optimal regression coefficient to improve prediction accuracy of Linear Regression. Bayesian Ridge [19] assumes that the distribution of regression coefficients is spherical Gauss distribution, and estimates the regression coefficients by maximizing the marginal likelihood function. ARD Regression [20] assumes that the distribution of regression coefficients is elliptical Gauss distribution parallel to coordinate axis, and estimates the regression coefficients by maximizing the marginal likelihood function. When using loss function to calculate errors in samples, Huber Regressor [21] sets a threshold. When the error value of sample exceeds the threshold, sample is divided into abnormal data, and the abnormal data is given a small weight. To some extent, the stability of Linear Regression has been improved. In the process of finding the minimum value of loss function, Least Angle Regression [22] solves angle of current loss residual, and takes feature vector with smallest angle as a direction to find the minimum value. It can obtain ideal results after a limited number of iterations, increase speed at which loss function converges. Lasso Lars [23] combines advantages of Least Angle Regression and Lasso Regression. The regularization term of the L1 norm is added to loss function, and angle for computing current loss residual is found in the training set, and the feature vector with the smallest angle is taken as the direction for finding the minimum value. Improve the generalization ability of Linear Regression, and also increase the speed of the loss function convergence. Passive Aggressive Regressor [24] is a linear model proposed for big data, which incorporates the idea of perceptron into minimization process of loss function. However, the regularization parameter is added to perceptron, which limits parameter size of the perceptron, and avoids the over-fitting problem caused by rough classification of perceptron.
In view of loss function of linear regression, a series of modifications and improvements have been made in the current research work. The stability of the model, the 2 EAI Endorsed Transactions on Collaborative Computing Online First generalization ability of the model, and the convergence speed of loss function have been improved. Based on the fitting functions generated by various regressions, this paper redefines a computing method of prediction value without changing the algorithm complexity of linear model, in order to reduce predictive error.

Introduction of computing method
The definitions of symbols in this paper are shown in Table 1.
As shown in Algorithm 1, the process of the computing method is divided into the following four parts. The first step is training process of linear model. According to feature vectors T Rx = {T Rx 1 , T Rx 2 , ..., T Rx n } and corresponding true values T Ry = {T Ry 1 , T Ry 2 , ..., T Ry n }, fitting function Y = W T X + ω is generated. According to fitting function, the predictive value of feature vector is calculated and a set {T Ry − P 1 , T Ry − P 2 , ..., T Ry − P n } is obtained. According to parameter α, a potential change interval [T Ry − P n *α, T Ry − P n *(2-α)] is defined, where original predictive value T Ry − P n is the center of the interval.
In the second step, for training data, true values of the feature vectors are known. When T Ry n is greater than T Ry − P n *(2-α), T Ry − P n *(2-α) is new predictive value corresponding to feature vector T Rx n . When T Ry n is in interval [T Ry − P n *α, T Ry − P n *(2-α)], T Ry − P n is new predictive value corresponding to feature vector T Rx n . when T Ry n is less than T Ry − P n *α, T Ry − P n *α is new predictive value corresponding to feature vector T Rx n . When parameter α changes in interval (0,1), the MAE and RMSE values of training data change accordingly. When MAE and RMSE take the minimum value, parameter α is determined. In

Change interval of predictive value
In training process of linear model, according to feature vectors of training sets T Rx = {T Rx 1 , T Rx 2 , ..., T Rx n }, parameters W T and ω are set, and loss function is defined. By solving the minimum of loss function, parameters W T and ω are obtained, the fitting function Y = W T X + ω is generated. According to fitting function, the predictive values {T Ry − P 1 , T Ry − P 2 , ..., T Ry − P n } of all training data are computed. Taking each predictive value T Ry − P n as center of interval [T Ry − P n *α, T Ry − P n *(2-α)]. At this time, the paper has the following definitions: • If true value T Ry n of T Rx n is greater than T Ry − P n *(2-α), set T Ry − P n *(2-α) as predictive value of T Rx n .
• If true value of T Ry n lies in interval [T Ry − P n *α, T Ry − P n *(2-α)], set T Ry − P n as predictive value of T Rx n .
• If true value T Ry n of T Rx n is less than T Ry − P n *α, set T Ry − P n *(2-α) as predictive value of T Rx n .
Suppose the value of T Ry − P n *α is represented by a, and value of T Ry − P n *(2-α) is represented by b. The specific method is shown in formula (1). a, b). a, T Ry n ≤ a.

T Ry
(1) As shown in formula (1), since true value T Ry = {T Ry 1 , T Ry 2 , ..., T Ry n } of training set is known, according to computing of MAE and RMSE, the objective functions are defined as shown in formula (2) and formula (3). 3 EAI Endorsed Transactions on Collaborative Computing Online First The range of parameter α is (0, 1). When both MAE − T R and RMSE − T R are minimum, parameter α is obtained, that is, each test data

Computing method for new predictive values
After input test data T Ex m , original predictive value T Ey m is computed according to fitting function Y = W T X + ω, and change interval [T Ey − D m , T Ey − U m ] is obtained according to parameter α. In training set, data T Rx n satisfying T Ey − D m ≤ T Ry − P n ≤ T Ey − U m is added to set T Rx = {T Rx 1 , T Rx 2 , ..., T Rx d }, and corresponding location label L d of T Rx d is shown in formula (4): According to formula (4), the number of 0, 1 and 2 in location label set L = {L 1 , L 2 , ..., L d } is expressed by S0, S1 and S2 respectively. The larger value of S2, the greater probability that true value T Ey m of T Ex m satisfies T Ey m ≥ T Ey − U m . The larger value of S1, the greater probability that true value T Ey m of T Ex m satisfies T Ey m ∈ (T Ey − D m , T Ey − U m ). The larger value of S0, the greater probability that true value T Ey m of T Ex m satisfies T Ey m ≤ T Ey − D m . In practice, the relationship between true value T Ey m and change interval [T Ey − D m , T Ey − U m ] can not be accurately judged. With three values of S0, S1 and S2 as probabilistic values, the final predictive values T Ey − F m corresponding to T Ex m is defined as shown in formula (5):

Complexity analysis
The

Data Sets
In the sklearn database [25], three standard data sets are selected to test the effectiveness of the proposed 4 EAI Endorsed Transactions on Collaborative Computing Online First method. The data set is suitable for regression tasks. The selected data set and invocation method are shown in Table 2.

Experimental result of data sets
In the first experiment of the paper, Linear Regression (represented by LR) was used to predict and analyze the three data sets. At the same time, according to the method proposed in this paper, a new predictive value of Linear Regression (expressed by LR+) is computed.
The predictive results were evaluated by using MAE and RMSE. The experimental results are shown in Table  3: As shown in Table 3, this paper presents a new method to computer the predictive value, which can reduce the predictive error to a certain extent without changing the complexity. This is because when test data are input, corresponding original predictive value is computed on fitting function. In practice, the true value of test data is located above or below original predictive value. If the position of true value can be correctly judged, prediction error can be reduced. In this paper, we take the original predictive value as a center, and define change interval of predictive value. According to data distribution in this interval, the position relationship between true value of test data and original predictive value is judged. New predictive values are computed according to the change interval and position relationship. The experimental results show that the new predictive value is closer to the true value.

Experimental result of linear models
In the second experiment of this paper, the data set boston was analyzed by linear models such as Ridge Regression and Lasso Regression. At the same time, according to the method proposed in the paper, computer new predictive values for linear models such as Ridge Regression and Lasso Regression(expressed by Ridge Regression+, Lasso Regression+). MAE and RMSE are used to evaluate the predictive effect. The experimental results are shown in Table 4. As shown in Table 4, based on fitting function generated by linear models such as Ridge Regression and Lasso Regression, this paper compares the errors between original predictive value and new predictive value. The experimental results show that, new predictive values proposed in this paper can reduce predictive error of all linear models to a certain extent. This is because the purpose of all linear models is to generate a fitting function, that represents the relationship between feature vector and predictive value. However, there is always an error between predictive value and true value. Based on fitting function of the linear model, this paper proposes a method for computing predictive value. The new predictive value can be closer to the true value than original predictive value, so predictive errors of all linear models can be improved to a certain extent.

Conclusion
The work of this paper includes following two parts. Firstly, based on fitting function of linear model, a change interval of predictive value is defined. If the relationship between true value and interval can be correctly judged, the prediction error can be effectively reduced. Secondly, when predicting the test data, the relationship between true value and interval can not be accurately judged. Therefore, according to distribution of training data in change interval, the relationship between true value and change interval of the predictive 5 EAI Endorsed Transactions on Collaborative Computing Online First value is estimated. According to the relationship, new predictive value is computed to replace the original predictive value. Tested on standard data sets, new predictive values can reduce the predictive error to a certain extent. The paper estimates the relationship between true value and change interval based on probability, so it can only improve predictive error to a certain extent. If the relationship between true value and change interval can be computed concretely, the improvement of predictive error will be more obvious. In the next step, this paper will continue to study how to analyze relationship the relationship between true value and change interval concretely based on classification algorithm.