Research Article
A Method for Normalizing Non-standard Words in Online Social Network Services: A Case Study on Twitter
@INPROCEEDINGS{10.1007/978-3-319-05939-6_35, author={Dongjin Choi and Jeongin Kim and Pankoo Kim}, title={A Method for Normalizing Non-standard Words in Online Social Network Services: A Case Study on Twitter}, proceedings={Context-Aware Systems and Applications. Second International Conference, ICCASA 2013, Phu Quoc Island, Vietnam, November 25-26, 2013, Revised Selected Papers}, proceedings_a={ICCASA}, year={2014}, month={6}, keywords={Words normalization Online social network services Twitter}, doi={10.1007/978-3-319-05939-6_35} }
- Dongjin Choi
Jeongin Kim
Pankoo Kim
Year: 2014
A Method for Normalizing Non-standard Words in Online Social Network Services: A Case Study on Twitter
ICCASA
Springer
DOI: 10.1007/978-3-319-05939-6_35
Abstract
Due to the big developments of Smartphone devices and on-line social network services, people can share diverse information about what they have been experienced during a day with no constrain to time or location. This fact has changed entire previous online system. We simply insert a query to search engine or OSNSs by using Smartphone devices. Because of this effectiveness, text data in OSNSs is getting bigger including many noisy data especially non-standard words. People are likely to type a text in short format such as abbreviation, acronym, and more when they using Smartphone to send a message to their friends in order to save time and data usages. As a result of these reasons, non-standard words on the web is extremely increasing so it has to be normalize into standard words in order to enhance performance of Natural Language Processing. When we analyze plain text data to extract semantic meaning, this nosy data has been ignore even though it has valuable information. In order to overcome this problem, we address a method for normalizing non-standard words in OSNSs, particularly for Twitter text data. We analyzed more than fifty million tweets which was collected by Stanford University and normalized non-standard words into standard English words by using diverse coefficient method such as dice, jacard, ochiai, sorgenfrei, and more. We finally conclude this paper by comparing those coefficient methods with our proposed one.