Research Article
Text Content Filtering Based on Chinese Character Reconstruction from Radicals
@INPROCEEDINGS{10.1007/978-3-642-23602-0_22, author={Wenlei He and Gongshen Liu and Jun Luo and Jiuchuan Lin}, title={Text Content Filtering Based on Chinese Character Reconstruction from Radicals}, proceedings={Forensics in Telecommunications, Information, and Multimedia. Third International ICST Conference, e-Forensics 2010, Shanghai, China, November 11-12, 2010, Revised Selected Papers}, proceedings_a={E-FORENSICS}, year={2012}, month={10}, keywords={Chinese character radical multi-pattern matching text filtering}, doi={10.1007/978-3-642-23602-0_22} }
- Wenlei He
Gongshen Liu
Jun Luo
Jiuchuan Lin
Year: 2012
Text Content Filtering Based on Chinese Character Reconstruction from Radicals
E-FORENSICS
Springer
DOI: 10.1007/978-3-642-23602-0_22
Abstract
Content filtering through keyword matching is widely adopted in network censoring, and proven to be successful. However, a technique to bypass this kind of censorship by decomposing Chinese characters appears recently. Chinese characters are combinations of radicals, and splitting characters into radicals pose a big obstacle to keyword filtering. To tackle this challenge, we proposed the first filtering technology based on combination of Chinese character radicals. We use a modified Rabin-Karp algorithm to reconstruct characters from radicals according to Chinese character structure library. Then we use another modified Rabin-Karp algorithm to filter keywords among massive text content. Experiment shows that our approach can identify most of the keywords in the form of combination of radicals and yields a visible improvement in the filtering result compared to traditional keyword filtering.