Web Log Pre-processing and Analysis for Generation of Learning Proﬁles in Adaptive E-learning

Adaptive E-learning Systems (AESs) enhance the efﬁciency of online courses in education by providing personalized contents and user interfaces that changes according to learner’s requirements and usage patterns. This paper presents the approach to generate learning proﬁle of each learner which helps to identify the learning styles and provide Adaptive User Interface which includes adaptive learning components and learning material. The proposed method analyzes the captured web usage data to identify the learning proﬁle of the learners. The learning proﬁles are identiﬁed by an algorithmic approach that is based on the frequency of accessing the materials and the time spent on the various learning components on the portal. The captured log data is pre-processed and converted into standard XML format to generate learners sequence data corresponding to the different sessions and time spent. The learning style model adopted in this approach is Felder-Silverman Learning Style Model (FSLSM). This paper also presents the analysis of learner’s activities, pre-processed XML ﬁles and generated sequences.


Introduction
In the past decade, Adaptive E-learning Systems (AESs) have attracted much attention of the researchers in the fields of Educational Data Mining (EDM).Many AESs have been proposed and implemented on Human Computer Interface (HCI) and Data Mining.Most of these AESs have focused on addressing the requirements of learners in order to improve the interaction and efficiency of the systems.E-learning systems cannot perform to the desired level if requirements handling is not an integrated part of web-based learning systems.Educators have identified the problem with these systems is that they mainly focus on learning materials and do not consider the adaptive Learner interface requirements of the learners.While going through the online courses on e-learning portals, most of the learners are unsure of their actual needs which may lead to inaccurate requests of the learning contents.To address these issues, it is beneficial that e-learning system analyze the actual needs of learners to improve their learning performance [9].
Many educational theories suggest that the integration of learning styles into learning activities may improve learning performance.AESs mainly focus on adaptation and personalization of Learner interface along with adaptive learning contents.The integration of Learner profiles with the learning style can be useful to enhance AESs by providing a learner with personalized learning guidance that is appropriate to their potential needs.The work proposed in this paper is useful to identify the learning profiles based on analysis and pre-processing of web logs data.The identified learning styles is classified based on Felder-Silverman Learning Style Model (FSLSM) to identify exact types of learners.Many researchers have concentrated on identifying learning styles based on FSLSM to provide learning contents to specific types of learners.The described work is integrated with web portal that can be applied with other e-learning systems.This paper is organized as follows.Section II introduces related research work in pre-processing and learner profiles with adaptive e-learning styles.Section III describes the methodology of the system.Section IV presents algorithmic approach of analysis and pre-processing methods of web log data.Section V presents the log analysis and sequence data which is the result of pre-processing.Section VI gives conclusions and discusses future works.

Related Works
In teaching-learning process every learner will have different background knowledge, learning preferences and individual students may expect different approaches towards learning courses.With this main objective, adaptive e-learning systems (AEHSs, AdaLearn, iLessons) have been developed to offer personalized learning content to improve their learning outcome [20] [2] [18].Many researchers have been concentrating on identifying the learner's learning styles to implement adaptation in online learning management systems.Automatic and dynamic detection of learning style has its attractive advantages in incorporating into portal to identify types of learners in no time [1] [15] [6].There are many methods proposed which automatically estimates learner's learning styles with respect to the Felder-Silverman Learning Style Model based on their behaviors about the online course.To bring about the dynamic nature into the system and to provide adaptive interface, the learning behavior of an individual has to be analyzed and modeled.In order to understand the process, it is necessary to automate the process of detecting learning style based on the learning behavior and generate personalization and adaptation for different types of learners.Considering learning and how to improve a learner's performance, these systems must know the way in which an individual student learns best [17] [3].A learning-style model classifies students according to where they fit on a number scales pertaining to the ways they receive and process information.Mismatches exist between common learning styles of engineering students and traditional teaching styles of engineering professors.[7] [8].The importance of applying these learning styles to different learning systems, various problems still need to be solved, such as matching teaching contents with the student's learning style [8].Enabling teachers to know their learner's learning styles and making students aware of their own learning styles increases teacher's and learner's understanding about the learner's learning process, allows teachers to provide better support for their students, and has therefore high potential to enhance teaching and learning [10] [5].Web Usage Mining (WUM) is a kind of data mining method that can be used to discover user access patterns from Web log data.WUM includes three phases that are called preprocessing, pattern discovery and pattern analysis.Web usage analysis or web usage mining or web log mining or click stream analysis is the process of extracting useful knowledge from web server logs, database logs, user queries, client side cookies and user profiles in order to analyze web learner's behavior.Web usage analysis requires data abstraction for pattern discovery.This data abstraction can be achieved through data pre-processing [13] [16] [11] [12].
An important research area in education and technology is how the learners use e-learning.By exploring the various factors and relationships between them, we can get an insight into the learner's behaviors for delivering tailored e-content required by them.Although many tools exist to record detailed navigational activities, they don't explore the learner's usage patterns for an adaptive e-learning site [14] [19] [4].

Methodology
The approach of learning profiles generation through usage log data of e-learning portal which is shown in Fig. 1.The detailed analysis of log data and pre-processing of usage data is used to generate learning sequences.The implementation of e-learning portal has been done using Microsoft Visual Studio 2010 and Microsoft SQL server 2005.The portal is deployed on IIS server to provide access to all learners in the Internet.The log file option of IIS server is set to W3C extended log file format which will capture usage details of learners who are accessing the portal.

Evaluation Parameters based on Log File and Database
Following are the parameters used to define the log file and database.
1. Topics (T): Topics are related to the contents of the portal and are defined by the instructor of the course.

Actions (A):
The click made by learner on particular content type like text links, video lectures, downloadable links etc.Following are the type of contents for analysis: Action A 1 is default action for any other action.

D(T File
File i =file id assigned to each file for specific topic.n = number of files available in the portal.Total duration (D) spent on accessing files for specific topic is calculated in equation 2.
(b) Frequency (Freq i ): Count of accessing a type of file during the visits on portal shown in equation 3. Frequency (Freq) indicates the learner's interest in specific type of contents which will help to identify the learning style on portal.
A i = action of accessing a specific type of file during the visits on portal.Frequency (Freq) indicates the learner's interest in pages which will help to identify the learning style on portal.

Accessing Parameters of Pages
A j = action of accessing a page during the visits on portal.

Web Log Pre-processing to identify Learning Sequences
Data pre-processing consists of various steps which are tedious and time consuming to implement in real-time application.Also same steps can not be suitable in elearning application for analysis.The proposed methodology describes about the issues to be addressed, parameters to be considered and algorithms to generate the XML files for learning sequences.

4.1.
Issues to be addressed during Pre-processing: 1. Collection of Web Log Data: The web log data for usage of learners need to be captured in different formats.IIS log data of web server gives the information related to sessions of learners along with pages accessed.There is an requirement to capture data at the application level on web server which will consider the detailed usage of learner's sessions.Logs are to be generated and stored in the database along with sequence of pages and files which are accessed by specific learner on portal.
2. Pre-processing of Web Log Data: The captured web log data needs to be converted into standard format for analysis purpose.Generally, several pre-processing tasks need to be done before applying web mining algorithms on the web server logs such as Data Cleaning, Log Identification and Session Identification.
3. Pre-analysis of Web Log Data: Before converting the web log data into standard format the pre-analysis has to be done with different algorithmic approaches.This is useful to identify patterns and to check whether the captured data is correct or not.
4. Length and Order of the activities in a session: The elearning sessions are as set of activities where learner is accessing available learning components and contents on portal.To identify the learning styles of a learner, logs should capture the activities with respect to time in each session.Such sessions will be different from other sessions according to length and order of activities.
To combine common patterns and common Learner profiles sessions should be aligned based on similarity between sessions.

Assumptions/Requirements for Pre-processing of Web Log Data:
1. Idle time spent on specific file or page is 10 min.If a learner is not doing any activity on portal for 10 min then the session will terminate automatically and learner will get logged-out from the portal as per time oriented heuristic.
2. All sessions of one specific learner's will be considered upto 30 min to generate final XML logs.

If a learner has not logged out or left the session in
between then those sessions are considered for 10 min as per idle time for final XML logs.
4. Unique session ids are maintained for every log record of learner.
5. Log records are filtered out only for GET/POST methods and for .aspxpages.
6. Log records are also filtered out for unique learners by removing Crawler/Spider/Robot.
7. Log records are sorted as per session time and different sessions of one learner are combined to understand usage patterns.

Parameters to be considered in XML generation algorithms:
1. Session: Sequence of pages and files accessed by a learner on a particular website during a specified period of time.One sequence includes number of sessions with total time of 30 mins.
2. Frequency: Number of times a specific file and a page is accessed by learners in all sessions.
3. Time Spent: Total time spent on file or page by learner's in all sessions.
4. Page Sequence: Pages and files accessed in a specific order by a learner.Page sequence is useful to identify the learning path of each learner.
The log data is generated in standard XML format based on Assumptions mentioned above.The XML file can directly be useful as input for clustering algorithms.Two different XML files are generated: File1 contains Learner session data of pages/files, time spent on page/file in each session and frequency of accessing the pages/files.File2 contains page and file sequence of each learner which are called as web sessions.

XML File1 and File2 Generation:
Session can be described as the time spent on portal by a learner from the moment he/she logged in to the moment he/she logged out.The sessions of each learner is combined and identified as the time spent on each page/file separately.Session also describes how many time each learner accessed the page/file.The total time spent on each page and file as well as frequency of accessing the page and file is converted into XML tags for the unique learner id as shown in Algorithm 1. Session can also described as the sequence of pages and files accessed by each learner in specific order as shown in Algorithm 2.    Learners Accessed Different Topics Fig. 4 shows the number of times the learners accessed the topics.This analysis gives the interest in specific topic and requirement of providing good material on different topics.The analysis can be further be captured into accessing the topics as per learner's requirement.E.g. how many learners accessed previous concept or advanced concept topic after accessing main topic.

Number of times Learners Accessed Different Pages on Portal
Fig. 5 shows that learners have not only accessed TopicSearch to access different files but also accessed other pages such as exercise and announcement pages.This analysis is important to understand the behavior of learner's on the portal.

Time spent on Portal by Learners
Fig. 6 shows that how much time is spent by learners on the portal in all their sessions.This analysis is important to identify the sequences of each learners based on their session ids and session time for pre-processing.styles are captured.The usage data is useful to identify the learning styles of the learners according to FSLSM.Four dimensions such as Pre-processing, Perception, Input and Understanding and eight categories of mentioned dimensions like Active/Reflective, Sensing/Intuitive, Visual/Verbal and Sequential/Global are used.The proposed work focuses on grouping the session details obtained from different log files.Different algorithms have been implemented to analyze the session log details for different types of learners.The method of capturing the learning styles comprises IIS log files and database entries where important parameters of learning styles are captured.The learner's session has been considered as the total number of learning objects accessed by that learner.The captured database log data consist of the details related to pages and files as per unique session identifier allotted to a learner.To get the clear analysis of preprocessed log data the different algorithms are implemented and results are obtained.Web log pre-processing is one of the major concern in Web Usage Mining.In this work pre-processing of web log data is done based on different constraints that help to identify proper learner's sessions.The factors considered are the Log Time and Frequency for identifying the interest of a leaner.The web log data has been analyzed based on the FSLSM dimensions and mapped into learning objects.In pre-processing, the log data has been converted into the standard XML format for creating sequence files of usage patterns.These sequence files can be directly used for clustering of learner's profiles in order to understand types of learners.In future, to incorporate the use of learning styles into online courses, we plan to encapsulate the defined approach into any web based e-learning systems.The encapsulated approach will be useful to generate adaptive user interface components in order to achieve personalization and adaptation for learners.

Figure 1 .
Figure 1.Web Log Analysis Approach on e-Learning 03 -04 2016 | Volume 3 | Issue 10 | e4 3. Accessing Parameters of Files (a) Duration (D(TFile i )): is the time spent on a specific type of file of a topic.The time is calculated based on the starting time (Start Time ) and ending time (End Time ) of accessing a file in a particular session time.Session Time (Sess Time ): As equation 1, time between Logged in action and Logged out action.

( a )
Duration (D(TPage j )): is the time spent on a page.The time is calculated based on the starting time (Start Time ) and ending time (End Time ) of accessing page in a particular session time.Session Time (Sess Time ): As in equation 4, time between Logged in action and Logged out action.D(Page j ) = End Time − Start Time (4) Page j =page id assigned to each page on portal.m = number of pages available in the portal.Total duration (D) spent on accessing pages on portal is calculated in equation 5. Frequency (Freq j ): Count of accessing pages during the visits on portal shown in equation 6.

1 . 2 .
Analysis done from Database: Frequency and Total time spent on specific file by learners in all of their sessions Fig.2shows the number of times file accessed and total time spent on specific file by learners in all of their sessions.Learners who have accessed the portal are spent time on files in different sessions.The graph is shows the result of most frequently accessed files by learners in specified duration.Snapshot of Report at Instructor Side: Number of times a specific type of file accessed by Learners Fig.3shows the report which an instructor can generate to get learner wise count of access of different types of files.As per the implementation, the portal is supporting only for PDF, PPT and Video files.Depending on the frequency of accessing specific types of file, learners interest in specific material can be identified.INPUT: A finite set of Learners L = L 1 , L 2 ,....,L N , Sessions S =S 1 ,S 2 ,....,S Q , PageURL P=P 1 , P 2 ,....,P R and FileURL F=F 1 , F 2 ,....,F X OUTPUT: XML File initialize Session Time ← 0, SessionLog ID ← 0, SessionPageLog ID ← 0, SessionFileLog ID ← 0, IsPage ← 0 for each Sessions S j where j ← 1 to Q do compute Session Time = Start Session − End Session if "Session Time " > 29min then get Learner ID , SessionLog ID from SessionLog file end if end for for each Session S j where j ← 1 to Q do for each PageURL P k where k ← 1 to R do get SessionPageLog ID from PageLog file if SessionLog ID == SessionPageLog ID then get Learner ID , PageURL, Page ID , LogTime from PageLog file set IsPage ← T RUE end if end for if IsPage == T RUE then for each PageURL P k where k ← 1 to R do for each Learner L i where i ← 1 to N do if "PageURL "is accessed then create XMLTag for Session ID , Learner ID , PageURL set Session ID , Learner ID , PageURL in "UserActivities "

Algorithm 2 :
XML File2 Generation Algorithm 3. Analysis done from Database: Number of times

Figure 2 .
Figure 2. Frequency and Time Spent on Specific File by Learners

Figure 3 . 7 5
Figure 3. Number of Times Learners Accessed Specific Type of File

Figure 4 .Figure 5 .
Figure 4. Analysis of Number of Times Learners Accessed Different Topics

Figure 6 .Figure 7 . 6 EAI
Figure 6.Analysis of Number of Times Learners Accessed Different Pages

Figure 11 .File 1 Figure 12 . 6 .
Figure 11.Sequence File 1 where j ← 1 to Q do for each PageURL P k where k ← 1 to R do get SessionPageLog ID from PageLog file if SessionLog ID == SessionPageLog ID then get Learner ID , PageURL, Page ID , LogTime from PageLog file set Flag ← 1 end if end for if Flag == 1 then for each PageURL P k where k ← 1 to R do for each Learner L i where i ← 1 to N do if "PageURL "is accessed then set PCount = PCount + 1 set PLogTime = PLogTime + LogTime end if create XMLTag for Session ID , Learner ID , PageURL, PLogTime, PCount LogTime end if create XMLTag for Session ID , Learner ID , FileURL, FLogTime, FCount j where j ← 1 to Q do for each FileURL F d where d ← 1 to X do get SessionFileLog ID from FileLog file if SessionLog ID == SessionFileLog ID then get Learner ID , FileURL, File ID , LogTime from FileLog file set Flag ← 2 end if end for if Flag == 2 then for each FileURL F d where d ← 1 to X do for each Learner L i where i ← 1 to N do if "FileURL "is accessed then set FCount = FCount + 1 set FLogTime = FLogTime +