Matrix Factorization Based Recommendation System using Hybrid Optimization Technique

In this paper, a matrix factorization recommendation algorithm is used to recommend items to the user by inculcating a hybrid optimization technique that combines Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD) in the advanced stage and compares the two individual algorithms with the hybrid model. This hybrid optimization algorithm can be easily implemented in the real world as a cold start can be easily reduced. The hybrid technique proposed is set side-by-side with the ALS and SGD algorithms individually to assess the pros and cons and the requirements to be met to choose a specific technique in a specific domain. The metric used for comparison and evaluation of this technique is Mean Squared Error (MSE).


Introduction
There's an explosive growth of information in the present world, giving a tough time to find information that is appropriate from a tremendous amount of data present online. The data required obtained becomes mystical on the internet [3] [7]. So, we rely highly on recommendations [22]. A recommender system comes under the family of knowledge filtering systems whose main cause is a prediction [11] of the user's rating values to an item and thus drawing the interests of the user from the available data. Recommendation systems are implemented in a wide range of areas, including news, movies, text mining, books, etc... Not every recommender system is capable of handling all kinds of situations. Recommender systems can be classified into two strategies. The first category is the content-based filtering approach responsible for creating a profile for every user or product to delineate its nature. Consider, for example, a movie profile that contains characteristics based on its genre, the cast and crew, the popularity of that movie, etc. Similarly, the profiles of users contain arithmetical information or answers to a suitable questionnaire. The profiles that eventuate can be utilized to map users with matching products through programs. The disadvantage of a content-based recommendation system is it is necessary to obtain external information which is very difficult to muster in practice [9].
Collaborative filtering (CF), which plays a crucial part in generating personalized recommendations, is an alternative strategy and also among the most traditional and striking recommendation algorithms [10] [19]. CF inspects user dependencies and product relationships to identify new user-item associations [8] [17]. It can be observed that in some scenarios, few CF systems determine items pairs that are similarly rated or harmonious users with a compatible history of purchasing or rating to infer foreign relationships between users and items with just the data about the history of users which 2 could be the way those users rate items or could be their previous transactions [13] [14]. CF is domain free but still better than the content-based approach as it can deal with kinds of information that can be ambiguous and difficult [9]. However, the CF technique hampers the sparse user rating matrix problem, resulting in poor recommendation precision. A common solution to this type of trouble is replacing the missing values with the average value of all the ratings specific to that user or item. This approach can decrease the imprecision to only some extent. The value with which the missing values are being replaced has a serious effect on the produced recommendations' credibility [23].

EAI Endorsed Transactions
Due to the leverages of perpetuating accuracy for scaling information, less estimation cost, and deducing the issues from high sparsity levels, matrix factorization latent factor models have received spotlight for their work in the recent days [6] [21]. This is the most widely used collaborative filtering approach to identify hidden factors that affect the user's preferences. The Matrix factorization recommendation system approach is more memoryefficient and more specific than the similarity-based recommendation process, which takes the resemblances between users and objects to make suggestions only into consideration [4] [12]. Matrix factorization is widely applied in collaborative filtering, and we could apply SGD and ALS as the learning algorithm. This type of method in which a large matrix is decomposed into smaller matrices is called Matrix Factorization, and it was invented and popularized by Netflix prize winners. This method boosted the performance of recommendation systems from old methods which were mostly neighbourhood-based. This kind of decomposition aims to find latent factors and reduce the dimensions [18].
Present matrix factorization methods are engineered with precise feedback data, which analyzes the data easy [18]. However, feedback pages for customer reviews and ratings have to design by the service providers, which is a difficult process and time consuming due to user involvement. Therefore, recommenders can speculate user preferences from the more ample implicit feedback, which indirectly makes suggestions based on the user's behaviour [4]. Online shopping history, browsing data, previous search patterns, or even mouse movements are few implicit feedback data types. For example, a user that bought many products from a brand probably likes that brand [5] [6]. To develop a modified matrix factorization to take up precise feedback data, we integrate the advantages of alternating least squares and stochastic gradient descent into a Matrix Factorization recommendation system using incremental stochastic gradient descent alternating least squares updating technique [1].
A prominent literature part in the domain focuses on handling the explicit feedback; using this type of direct data from the users is much appreciated. But practically in many scenarios, recommender systems have to be mainly focused on latent data [5]. This may overcome users' hesitation or disinterest to rate items and allow the system to gather feedback data explicitly. In a latent model, once the user accepts the cookies and gives permission to collect user data, they need not provide explicit feedback anymore (e.g., ratings). For interpretation of implicitfeedback, the recommender has to consider proper measures. In the conventional methods [15] [20], a user specifies a numeric score, and there are clear metrics such as mean squared error to calculate the success rate by predictions. Moreover, with latent models, the availability, competition of the item with other items, and repeat feedback have to be considered [24].
The remainder of the paper is sequenced as follows: Section 2 illustrates related work, the methodology is described in section 3, section 4 explains hybrid optimization technique, and section 5 details about environmental setup, section 6 describes the experimental evaluation, the conclusion is exploited in section 7, and section 8 briefs about future work [25].

Related Work
In the past few years, extensive work has been done by many researchers on recommendation systems. Hongmei H. Li et al. [1] have proposed a recommendation framework method with an all-weighted strategy, a more efficient and better-optimized scheme. Based on the outcomes of experimentation on a pair of recommendation techniques, it is determined that the suggested approach surpasses several prediction-oriented and ranking-oriented evaluation metrics. C. Lin et al. [2] presented a RI-SGD model developed for adept computations and precise time-variant implicit feedback MF recommendation system, consisting of ALS with weight regularization developing stage and SGD in the modifying stage. In comparison with the process of retraining the full model, the calculated scores display that the RI-SGD approach can obtain similar recommendation accuracy, but requires only about 0.02% of the retraining time. To evaluate the recommended quality, the discounted cumulative gain (DCG) technique is applied. DCG, which interprets ranking quality. M. Li et al. [5] has put forth an improvised model named TimeMF, based on latent feedback and includes temporal information, which is one of the main solutions for information overloading in social e-commerce networks by addressing the absence of negative information in user history. The optimized model gives a unique learning rate to each feature of the latent feature matrix and adopts adaptive gradient descent to update the learning rate to upgrade the accuracy level. The outcomes from experimentation prove that this model surpasses the standards concerning ranking-oriented evaluation. Y. He et al. [6] have proposed a novel model Ciao. A thorough analysis of four different datasets shows that CMF is competitive and better than present state-ofthe-art baselines. J. Z. Sun et al. [16] proposed a new algorithm for prediction, estimation, and recommendation called the collaborative Kalman filter. In this paper, the author proposed an amplified Gaussian PMF which considers the user behaviour trajectories. The approach extends probabilistic matrix factorization in time through a state-space model. This leads to an estimation procedure with parallel Kalman filters and smoothers coupled through item factors. Learning of global parameters uses the expectation-maximization algorithm. When analyzed with current methods, this technique has shown that it is better on the computed information and movie recommendation data in the real-world. Evaluation metrics used are Root mean square error (RMSE) and Collaborative Kalman filtering (CKF).
According to the literature survey, many methods were examined for recommendation systems using different frameworks. In this paper, a recommendation system based on matrix factorization using a hybrid optimization technique has been explored to improve the algorithm's efficiency.

Methodology
The proposed technique is an improved latent factor collaborative filtering model using a combination of the optimization algorithms stochastic gradient descent and alternating least squares for Matrix Factorization recommendation systems using explicit feedback from the users.

Latent Factor Collaborative Filtering Approach
The latent factor model is an advanced system of contentbased recommendation techniques. It works on the supposition that one should know the factors that control the user's preferences on the respective item. Depending on the closeness of a user i with the hidden factors, the relation can be defined as ui, and its corresponding latent factor vj can define an item j. The high rating score is obtained usually when the two factors match (same as content-based filtering) which can be calculated using the inner product of the user's latent factors with the item's latent factors. The rating of the ith user on the j th item is modelled as: If we take into account the complete rating matrix involving M users and N items, the equation (1) can be modified as follows, i.e., equation (2) & (3), and (3) If the rating matrix is filled with no missing values, there would be no trouble generating personalized recommendations. But practically that's absurd as in practice with the extreme loads of data; the matrix is emptier than it's filled. Now, finding these missing values and filling them is the main issue. Once it is completed, the prediction for users on items can be made with improved accuracy. This can be represented in mathematical form as: In the above equation (4), R is a sample binary matrix with values 1 if the ratings are known and 0 if unknown. And the symbol · represents the dot product. By solving the following problem, estimation of the matrices of the implicit factor for the users and the products is possible as: In the above equation (5), F represents the Frobenius norm which is used to solve the problem of over-fitting. For solving the above equation (5), many techniques are available from least squares to multiplicative updates or gradient descents. The best algorithm used is the simple alternating least squares to utilize the technique in practice. For a bi-linear cost function like (5) the convergence of the values to a global minimum through iterative procedures is highly impossible and can only guarantee local minima convergence. One solution to this issue is to solve the ratings themselves directly rather than the factors by considering a supposition that the rating matrix is of low-rank. The rank of the matrix is considered the same as the number of factors. Explaining the data Y optimally, and finding X is the best and straightforward method for the ratings to be solved. There is no flexible fix to this issue as the rank minimization is known to be NP-hard. This issue can be tended by following the theoretical study that has shown that a low-rank solution (under certain assumptions) can be obtained by relying on the nuclear norm and relaxing the NP-hard rank minimization problem to its closest convex surrogate. This can be represented in the equation form as, In equation (6)  scenarios, less than 1% of the information is present. And hence collaborative filtering became a highly underdetermined problem. In such a case, to advance the outcomes, even secondary information can also be utilized, as shown in figure 1.

Proposed Hybrid optimization technique approach
Considering all the techniques proposed in the literature, a hybrid optimization technique is introduced, combining the Alternating Least Squares optimization technique with the Stochastic Gradient Descent technique for updating the model. By integrating the two optimization techniques using explicit user feedback, a practically possible updating of user data can be achieved and thus reduces the computational time and effort.

Training the model
To develop a matrix factorization model effectively for a few million items and users, ALS is used in training the model using the equations as shown in (7)

Minimizing the error function
The vector for each user (xu) and item (yi) in feature dimensions has to be obtained to diminish the below loss equation.
In the above equation (9), the rui is the true rating of the item. The two terms at the end of the function are added to prevent overfitting the user and item vectors. The main motto is to minimize this loss function.

Training the model using SGD
The ALS technique iteratively updates the complete model with now involving the new entries in the model. But this process is extremely time-taking. In contrast, SGD is comparatively quick when calculating one user and item is taken into account. Henceforth, iteratively advanced SGD is also put forward so that the matrix factorization recommendation model is modified. In equation (9) the first-order gradient of the cost function is calculated, and it's progressed as following functions which are equation (10) & (11): ∂C(x,y) Now suppose that the new rating values rui are incoming as new inputs and that latent feature matrices X, Y fit in each machine's memory. Consider a supposition that the incoming values are shuffled and can use Stochastic Gradient Descent (SGD) to advance the X, Y feature matrices, as shown in equation (12) & (13).

Algorithm-2: Streaming ALS using SGD
for new r ui do  The above flowchart (figure 2) explains the processing of recommendation data by taking the user's rating dataset as input. The model is trained using the ALS optimization technique induced in the MF algorithm. The model is retrained using the SGD optimization technique, and the obtained results are combined henceforth.

Environmental Setup
The dataset utilized in this article is an ml small dataset, from the Movie Lens website for training and testing data with a rating scale of 1 lakh users. The language used for this method is Python of version 3.7 with the platform Jupyter Notebook in software Anaconda.

Experimental Evaluation
This approach's evaluation metric is the Mean Squared Error (MSE), as shown in equation (14). MSE calculates the measure of error that is present in between training and testing datasets. It can also be defined as it compares a predicted value with the observed or known value. The smaller an MSE value, the closer the predicted and observed values are, and the accuracy is better. If Xobs, i is the predicted rating on item i by user u, and Xmodel, i is the true rating on item i by user u, MSE of n corresponding rating-prediction pairs are defined as:    Figure 4 represents the MSE learning curve of the training and testing data for the SGD algorithm of the Matrix Factorization model. We can observe that the overfitting is reduced from this curve compared with the ALS algorithm to approximately 20% less. The best performing parameters for the SGD algorithm were 80 latent factors and regularization parameter 0.001 with 200 iterations through the experimental analysis.
The recommendations obtained individually by the two algorithms ALS and SGD are grouped by performing an inner join to output the user's final recommendations. During the comparison of optimization algorithms, the crucial detail for all the end-users is time-to-solution. When set side-by-side the convergence rates of ALS, SGD, and hybrid optimization algorithm which combines the previous two algorithms, it is showed that no algorithm could act best in all the domains. Moreover, it can be observed that gradient descent is persistently quicker compared to alternating least squares in most of the domains. Alternating least squares scales better on the Movie Lens dataset which is extremely sparse. Also, often alternating least squares perform better than gradient descent but typically do not overcome the performance loss that the algorithm starts with. Whenever dealing with implicit datasets, which are usually not sparse, SGD is not practical. ALS is a much more efficient optimization technique in these cases. When both the algorithms are combined, it sure adds the advantages of both of them like performing better on sparse data and execution speed. But alongside can bring some of the disadvantages like dealing with feedback data. If dealt with explicit data, SGD helps overcoming the performance loss of ALS and ALS can help in parallelizing the iterative execution.
Similarly, if dealt with implicit data, ALS can increase the efficiency of calculating latent factors, but SGD lacks the same, dealing with user history. Since the proposed hybrid model uses the users' explicit data, the performance of the model overcomes the issues dealt with by the algorithms to some extent. But factors like the size EAI Endorsed Transactions on Energy Web 07 2021 -09 2021 | Volume 8 | Issue 35 | e14 and sparsity of the dataset, type of user data, and so on need to be considered and analyzed when choosing a particular technique.

Conclusion
This paper proposes an effective and adequate recommendation system using matrix factorization technique with fast optimization scheme. The proposed framework addresses the cold-start problem by recommending popular items to the new user from user feedback while iteratively updating the model by considering both Alternating Least Squares and Stochastic Gradient Descent algorithms training phase of the model. The proposed framework has obtained better performance than the state-of-art algorithms.

Future Work
Exploring other collaborative filtering algorithms for hybrid optimization and using implicit feedback data, as the ALS algorithm works better while using user history rather than ratings are considered for future work.