Research Article
Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information
@INPROCEEDINGS{10.1007/978-3-642-36642-0_32, author={Ngo Linh and Nguyen Thi Kim Anh and Cao Dat}, title={Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information}, proceedings={Context-Aware Systems and Applications. First International Conference, ICCASA 2012, Ho Chi Minh City, Vietnam, November 26-27, 2012, Revised Selected Papers}, proceedings_a={ICCASA}, year={2013}, month={2}, keywords={Feature Selection Label Propagation Web Classification Web Mining}, doi={10.1007/978-3-642-36642-0_32} }
- Ngo Linh
Nguyen Thi Kim Anh
Cao Dat
Year: 2013
Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information
ICCASA
Springer
DOI: 10.1007/978-3-642-36642-0_32
Abstract
Classification of web pages is essential to many information management and retrieval tasks such as maintaining web directories and focused crawling. One problem in web page classification is that, unlabeled training examples are readily available, while labeled ones are often costly to obtain. Furthermore, the uncontrolled nature of web content presents additional challenges to web page classification, whereas the interconnected characteristic of hypertext can provide useful information for the process. To address these problems, we propose a graph-based semi-supervised classification framework which combines iteratively hybrid semi-supervised feature selection and Label Propagation learning using link information to improve the Vietnamese web page classification. The experimental results show that proposed method outperforms the state-of-the art methods applying to Vietnamese web page classification.