About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Forensics in Telecommunications, Information, and Multimedia. Third International ICST Conference, e-Forensics 2010, Shanghai, China, November 11-12, 2010, Revised Selected Papers

Research Article

Text Content Filtering Based on Chinese Character Reconstruction from Radicals

Download(Requires a free EAI acccount)
521 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-642-23602-0_22,
        author={Wenlei He and Gongshen Liu and Jun Luo and Jiuchuan Lin},
        title={Text Content Filtering Based on Chinese Character Reconstruction from Radicals},
        proceedings={Forensics in Telecommunications, Information, and Multimedia. Third International ICST Conference, e-Forensics 2010, Shanghai, China, November 11-12, 2010, Revised Selected Papers},
        proceedings_a={E-FORENSICS},
        year={2012},
        month={10},
        keywords={Chinese character radical multi-pattern matching text filtering},
        doi={10.1007/978-3-642-23602-0_22}
    }
    
  • Wenlei He
    Gongshen Liu
    Jun Luo
    Jiuchuan Lin
    Year: 2012
    Text Content Filtering Based on Chinese Character Reconstruction from Radicals
    E-FORENSICS
    Springer
    DOI: 10.1007/978-3-642-23602-0_22
Wenlei He1, Gongshen Liu1, Jun Luo2, Jiuchuan Lin2
  • 1: Shanghai Jiao Tong University
  • 2: The Third Research Institute of Ministry of Public Security

Abstract

Content filtering through keyword matching is widely adopted in network censoring, and proven to be successful. However, a technique to bypass this kind of censorship by decomposing Chinese characters appears recently. Chinese characters are combinations of radicals, and splitting characters into radicals pose a big obstacle to keyword filtering. To tackle this challenge, we proposed the first filtering technology based on combination of Chinese character radicals. We use a modified Rabin-Karp algorithm to reconstruct characters from radicals according to Chinese character structure library. Then we use another modified Rabin-Karp algorithm to filter keywords among massive text content. Experiment shows that our approach can identify most of the keywords in the form of combination of radicals and yields a visible improvement in the filtering result compared to traditional keyword filtering.

Keywords
Chinese character radical multi-pattern matching text filtering
Published
2012-10-10
http://dx.doi.org/10.1007/978-3-642-23602-0_22
Copyright © 2010–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL