Context-Aware Systems and Applications. First International Conference, ICCASA 2012, Ho Chi Minh City, Vietnam, November 26-27, 2012, Revised Selected Papers

Research Article

Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information

Download
352 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-36642-0_32,
        author={Ngo Linh and Nguyen Thi Kim Anh and Cao Dat},
        title={Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information},
        proceedings={Context-Aware Systems and Applications. First International Conference, ICCASA 2012, Ho Chi Minh City, Vietnam, November 26-27, 2012, Revised Selected Papers},
        proceedings_a={ICCASA},
        year={2013},
        month={2},
        keywords={Feature Selection Label Propagation Web Classification Web Mining},
        doi={10.1007/978-3-642-36642-0_32}
    }
    
  • Ngo Linh
    Nguyen Thi Kim Anh
    Cao Dat
    Year: 2013
    Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information
    ICCASA
    Springer
    DOI: 10.1007/978-3-642-36642-0_32
Ngo Linh1, Nguyen Thi Kim Anh1,*, Cao Dat1,*
  • 1: Hanoi University of Science and Technology
*Contact email: anhnk@soict.hut.edu.vn, caomanhdat317@gmail.com

Abstract

Classification of web pages is essential to many information management and retrieval tasks such as maintaining web directories and focused crawling. One problem in web page classification is that, unlabeled training examples are readily available, while labeled ones are often costly to obtain. Furthermore, the uncontrolled nature of web content presents additional challenges to web page classification, whereas the interconnected characteristic of hypertext can provide useful information for the process. To address these problems, we propose a graph-based semi-supervised classification framework which combines iteratively hybrid semi-supervised feature selection and Label Propagation learning using link information to improve the Vietnamese web page classification. The experimental results show that proposed method outperforms the state-of-the art methods applying to Vietnamese web page classification.