Intelligent Systems of Machine Learning Approaches for Developing E-Services Portals

This paper provides a framework for intelligent servers that provide e-services on portals and applications for e-commerce purposes, such as an intelligent server that can dynamically plan and structure to respond to future users' needs and provide the appropriate e-services at the right time. To figure out how web users learn the log files collected from web users' connections and the internet, we use data mining techniques. In our experimental study, we use real data sets. Different data mining techniques are also used for Developing Intelligent E-Services Portals in future work, and it is a very difficult task in big data to identify regular trends. As the duration of the trends has to be discovered to expand the analytical space that grows exponentially.


Introduction
Intelligent web server growth, particularly in e-commerce applications, has been more popular in intelligent business services.Intelligent servers can change their architectures automatically to satisfy potential product requirements.The layout of the smart cloud server is seen in Figure 1.We've got a customer insight at the edge of the infrastructure to have correct knowledge at the right moment for the right users.Web usage mining is the task of detecting the behavior of web users while they browse the web.The aim is to identify site users' navigation preferences.The online portals can be customized, the network layout is enhanced, and web server efficiency is optimized.Cloud mining activities is displayed in Figure 2.

Web User Identification
According to [1], it shows the web user identity and identifies the most difficult measures.Such a dilemma relies on the purpose of web mining operations to be carried out.For brief periods, the IP user address may have an appropriate response.There are two types of specific detection strategies [2]; proactive strategies and reactive methods.The proactive approaches are structured to distinguish users before or after accessing a website.Reactive methods aim to link individuals to log entries after the log has been written [3].The rest of the paper is organized as the detailed related work in section 2. The detailed experimental design is discussed in section 3. The proposed algorithm is stated in sections 4, 5 and 6, followed by the conclusion stated in section 7.

Related Work
Web usage mining is a hot research era in the field of web mining.The World Wide Web (Web for short) expansion has generated a large amount of data stored on the logs of the web servers of the web sites.Different hidden patterns can be discovered, such as user navigational patterns and the frequently visited web pages or page sets.etc… The information obtained from weblog data can be used for various purposes.[1] implements the association rule mining techniques that can help find the pages that are visited together even if they are not directly connected, revealing the correlations between groups of users sharing a specific event/interest [4][5].The derived hidden patterns can restructure web sites by adding links between those pages that are visited together.Another issue of the web mining technique is web user clustering to identify the categories to discover user profiles.Several studies have been suggested to analyze the nature of user navigation.[6] A clustering partitioning approach is introduced that visualizes the user's navigation routes incluster.[7] An expectation-maximization (EM) algorithm has been suggested to find the parameters' highest likelihood in a probabilistic model.The function relies on unobserved latent variables to evaluate user navigation patterns.[8][9] introduced a guideline framework for navigation pattern mining by website usage mining to forecast future user movements.

Experimental Design
We are mainly interested in users' navigation to websites.The Web servers also record information about web access in a log file when users visit a website.Each logfile record is a website request performed by a web user on the server.Each logfile record/raw usually includes the following information: IP (Internet Protocol) address of the user, connection date and time, page URL (Uniform Resource Locator) requested, request protocol, request status code, and page size (if the request is successful).It also contains the different Web user requests, as shown in Figure 3. Server Error One of the five answer groups is specified in the first digit of the status code.Microsoft IIS may use added decimal subcodes to provide more specific information, but they are not included here.For example, 2xx each has a group of subclasses from 200 to 207, in which each represents a valid HTTP (Hypertext Transfer Protocol) request, for example, 200(OK).The pre-processed log file that is ready for mining work is shown in Table 3.

Navigation Patterns Mining
A user navigation pattern is typical for a group of users' navigations.The web mining clustering techniques will be used to find the hidden pattern of the users' navigations.Furthermore, the data obtained from the log file cannot be used directly.Therefore, it must be preprocessed in an appropriate format for mining tasks, as previously mentioned.Table 4 displays log file contents obtained and used for the analysis by the webserver www.aceonlinestore.com.

Session identification
A user session is a finite collection of websites that the same user accesses during a particular visit.According to [7], [10] defines a user session as a collection of user accesses from the same IP address during a predefined period.When the period between requests approaches a certain amount, it may assume that the user is starting a new session.So, we use a 30-minute default timeout.
A unique number j=1 if the Web is made up of n pages.Every URL shall contain n.A user sits then defined by an ndimensional vector that displays the user's preferences in the j-th web page with the j-th term.In various forms, a web page may be described to interest levels.In this analysis, the degree to which a URL meaning is mainly linked to the extent of website use (number of pages visited / total number of pages during the sitting) and the time spent on the user's link is considered.Formal, a vector represents the i device session s (i) =(s1 (i), s2 (i), s3 (i), ,…, sn (i), ) with the property: For j=1 ... n, while the connection duration and time spent for the consumer on the j-th page in i th session are displayed by f j(i) and t j(i), respectively.The set of N sessions is defined from the log info following the pre-processing step.Table 6 represents the URL requests for each session that are useful for the clustering phase are set to a range of frequencies.Table 7 demonstrates the assigning of binary representation to the association between various URL requests for each URL request that occurs in each session.Such pre-processing in table 7 will be useful to discover the frequent correlated patterns.

Clustering Algorithms
Data clustering is an efficient means of identifying data similarities and merging related data [11].Any clustering techniques are adequate for the number of clusters.The algorithm tries to divide the data into the number of clusters defined in K-means in this situation.For certain instances, the number of clusters found from the beginning is not enough.Rather, the algorithm starts with searching the first big cluster and then seeks the second, etc.The changed k-means are displayed and explained.

K-means clustering
Clustering costs are dependent on the distance between the Euclidean vector xk in group j and the related cluster / groupcenter ci, identified by [12]: is the cost function in group i.In the c x n binary membership matrix, U determines the clustered groups where the variable uij is 1 if the jth data point xj is the group i and 0. The decreasing uij is as follows after the cluster centers are set [12]: Then xj belongs to group i because ci is the nearest, closest to all centers.On the other side, the optimum core ci that minimizes the mean of all the vectors in cluster i [12] when the membership matrix is set, i.e., when uij is defined: The algorithm will be provided in data set xi, i = 1, ..., n; it will then evaluate the cluster centers ci and its membership matrix U using the stages as follows: It must run many times because the K-means output relies on the cluster centers' initial position.Clustering involves an attempt to reduce each cluster's expense function.The measurement of the algorithm is performed with the test set's precision.The measurement data vectors are then allocated to their respective clusters according to each vector's distance and each cluster core after determining the cluster centers.An error calculation is measured, and for that reason, the root implies square error (Root Mean Square Error-RMSE).The algorithm is consistently evaluated for better results.

Modified K-means clustering
The enhanced K-means clustering for very broad datasets is a scalable analytics cluster algorithm.This is based on the K-mean algorithm; it may be categorical as well as constant.This needs only one data transfer.Two phases [12] have to be taken: • For certain small sub-clusters, the Pre-cluster records are seen in Figure 4. • The Cluster of the sub-clusters occurring from the precluster phase through the number of clusters required.
The number of clusters may also be identified automatically.For conceptual and range areas, it uses a log-like distance calculation.This is a probability-based distance.The difference between the two clusters is correlated with the log structure decrease as it is merged into one cluster.For the computation of log-likely, regular distributions for field ranges and multiple distributions for symbolic fields are presumed.The areas must always be autonomous, and the archives will be separate.The I to j class has [13]: Where, K A is the number of range type input fields, Nv is the number of records in cluster v, 2 k σ is the estimated variance of the kth continuous variable for all records, 2 vk σ is the estimated variance of the kth continuous variable for records in the vth cluster, and <i, j > is an index representing the cluster formed by combining clusters i and j.
However, the improved k-mean uses a two-stage mechanism that deals with a clustering algorithm system to evaluate the clusters' numbers automatically.It is considered that k-means need more time than modified k-means due to the number of iterations to find the optimal clusters.

Kohonen Clustering Algorithm
The model Kohonen [13] consists of the input and output layers, two layers of neurons or groups.The input layer is fully related to the output layer and the weight of each link.
Another way of approaching is the network structure.The Kohonen model parameters are represented as weights between input units and output units or as cluster core connected to each output unit.Output records are inserted into the network and cluster centers: The clusters are spatially organized in a two-dimensional matrix.Each record impacts the entity (cluster) to which it is allocated and the winning team's neighborhood units, as shown in Figure 5.

Figure 5. Kohonen network model [13]
Distance in a Kohonen network is calculated through a Euclidean between the encoded variable and the cluster core for the output unit.The reference kth for the reference database is set, and the output unit jth is set to the input field kth.The Euclidean distance between the output unit variable (the unit's center) and the input variable is the activation of the output function.Recollect that the winning machine for Kohonen networks is the lowest driven production unit.In addition to other forms of neural networks, enhanced activation is a stronger response.
Xik is the value of the kth input field for the ith record, and wjk is the weight for the kth input field on the jth output unit.

Clustering algorithms experimental results
Finding a relative navigation pattern can improve the content on clustered websites/pages, particularly in a website for a customer interested in such pages.This is one of the targets to make www adaptable to its users and leads us on its own to the smart server.By the number of accesses to the URL requests, we cluster our data into 8 classes using the K-means algorithm, and figure 6 shows the clusters.The Importance column indicates the field's overall importance to the model.This result will appear as 1 minus the p-value (probability value was used to measure significance from the t-test or chi-square test).The number of accesses per cluster is shown in Table 8.Two clusters have the highest access numbers, Cluster 2 and Cluster 6, so that the webmaster takes both clusters into account to update it continuously.To make the sites in such clusters more appealing to consumers.Table 9 displays how many URLs the users have used in each cluster.We begin with the number of the initial clusters to see the grouping of results.The comparison of the standard deviation of the two techniques is shown in Table 10.Table 10 shows that cluster members' similarities in the MK-means are more related to each other than K-means in most clusters.We can refer to this less variation in most clusters to the computation of log-likelihood and the hierarchal cluster (pre-cluster) step.As Figure 9 shows, MK means have a reduced difference for most of the clusters in our data set and appear with a lesser difference in more than one cluster.Still, they are expected to have a low overall difference in the all largest datasets.

Frequent patterns algorithms
As noted above, every E-service's web site's customization will offer useful insights into consumer behavior and allow end-users to recognize web users' surfing behavior through Eservice web sites.For illustration, user navigation's association may also be improved by suggesting specific products/services for website users based on their surfing behavior.Therefore, we need to use common design strategies to accomplish such a significant connection for the user's navigation patterns.

CARMA Algorithm
The Continuous Association Rule Mining Algorithm (CARMA) is an alternative to Apriori, reducing I / O costs, time, and space requirements.This requires only two passes and offers results with much lower rates of assistance than Apriori [14].This also permits improvement to the level of assistance during execution.CARMA covers transactions of items and documents.Items are conditions like a flag that displays whether a particular item is present or not in a particular transaction.A collection of items is a group of items that may or may not coexist in transactions.
Within two phases, CARMA continues.Initially, frequent itemsets are defined in the data, and then rules are created from frequent itemsets in [14].The following performs: 1. Find all similar items: • Find Frequently items: o items that are more than or equal to the level for minimum support in the database.

• Get Frequent itemset:
o Generate frequent item candidates.o Prune results to identify common itemset.
2. Establish strict rules for the interaction of frequent items • Rules that reach the level of minimum.Support and minimum confidence.
The following flow charts show the generation of correlated rules, as Figure 11 shows more details about generating strong rules.
Rule support is the ratio of data transactions that are important for tasks, and the trend is true.The measurement of trust or belief in each discovered pattern is confidence, as shown in figure 10.
The following example illustrates how to create a rule as seen:

Experimental Results
As user research on web site, he/she wants to reach its target more in a quicker time.Finding a correlation between user navigation paths can improve the navigation plan of requests for www.This will also become a powerful way in case we want the user to see it possible as a large number of available products/services.Furthermore, it will lead to an increase in the income of the E-services portals.
It may also make a part in an advertisement and the rest of it on the correlated page where such an issue can give more view about more advertisements to users.
Finally, implementing the mining of frequent patterns on the server-side will help us make our server intelligence to predict future customer needs.Moreover, we can also provide some programming facilities on the server to help adapt the website by improving the navigational plan, design, and content related to navigational patterns.We can capture the correlation of requested pages into a prescribed framework.With some facilities, we inherent this generated framework into a specified site to automatically reorganize the navigational plan, design, and change web pages' content.As previously mentioned, it will lead to what is called the concept of object-oriented navigation.There are three types of linkage between URL requests: strong, medium, and weak linkage.The following tables in tables 12 to 14 show this linkage of URL requests.In our experiment, we start to support and confidence in a low percent approximately 30% increase, and until the satisfaction of results occurs.This may depend on domain expertise.Table 15 shows the correlated request with support and confidence degree sorted by confidence.The following can lead us to display more ads in the strongly correlated pages and improve web catalogs for E-service sites.We mention that CARMA works at low support and confidence than Apriori.The following tables show the difference between the two techniques.In Table16, we use initial support and confidence 20% to see how far both techniques construct rules.It has been seen that CARMA has more generating rules than Apriori.In table 17, we use initial support and confidence of 30% for Apriori.It is noticed that Apriori does not operate in low support and confidence.Increasing or decreasing this ratio to see how related rules are generated.

Conclusion
This paper has addressed how to make the e-services portals adaptive and intelligent based on its user's navigations patterns.There are three aspects of achieving the development of intelligent portals.Firstly, the aspect is capturing user navigation patterns by using algorithms such as CARMA and Apriori.Secondly, grouping similar user's requests by implementing clustering techniques such as Kmeans, MK-means, and SOM (A system on a module).Thirdly, the classification of users' URLs being either malicious or benign will be a future work by proposed an adaptive classification technique.

Figure 1 .Figure 2 .
Figure 1.The proposed architecture of intelligent server

Stage 1 : 2 : 3 : 4 :
Dynamically select c points from data dots to initialize the cluster center ci.Stage Identify the matrix U of the membership.Stage Measure the cost feature to avoid if a certain resistance value is either underneath.Stage Cluster Core Change.Move to stage 2.

Figure 4 .
Figure 4. Shows the pre-cluster step of MK-means

Figure 6 .
Figure 6.Cluster numbers Table 11 shows each cluster's standard deviation in the Kohonen Clustering algorithm and the number of accesses in each.When determining the correct clustering technique, the nature of the training knowledge collection will be essential.The following figures 7 and 8 shows the deviations between the three techniques from the standard deviation Waleed M. Ead and Mohamed M. Abbassy EAI Endorsed Transactions on Energy Web 03 2021 -05 2021 | Volume 8 | Issue 33 | e12

Figure 7 .
Figure 7. Difference between K-means and MK-means in 6 clusters

Figure 8 .
Figure 8. Difference between K-means and MK-means in 7 clusters Intelligent Systems of Machine Learning Approaches for Developing E-Services Portals EAI Endorsed Transactions on Energy Web 03 2021 -05 2021 | Volume 8 | Issue 33 | e12

Figure 10 .
Figure 10.The demonstration shows how to generate the association rules

Figure 11
Figure 11 illustrates the correlated pages, which help us in finding association rules by CARMA.

Figure 11 .
Figure 11.Generate an association rule flow chart

Table 2 .
Status Code Categories

Table 3 .
Preprocessed Log File

Table 4 .
Contents of The Log File Used

Table 5 ,
the dataset contains about 184 URLs.Each URL address is first assigned to sequential numerical values.

Table 5 .
URLs Address Assigned to A Numeric Value

Table 6 .
URLs assigning for clustering

Table 7 .
URLs assigning for binary code for finding an association between requests

Table 8 .
Numbers Of Accesses For Each Cluster

Table 9 .
Allocating URL request to each cluster The selection of a certain number of clusters depends on the domain expertise as we do not know how many clusters we want.

Table 10 .
Comparison between modified k-mean and k-mean