Context-Aware Systems and Applications. Second International Conference, ICCASA 2013, Phu Quoc Island, Vietnam, November 25-26, 2013, Revised Selected Papers

Research Article

A Method for Normalizing Non-standard Words in Online Social Network Services: A Case Study on Twitter

Download46 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-05939-6_35,
        author={Dongjin Choi and Jeongin Kim and Pankoo Kim},
        title={A Method for Normalizing Non-standard Words in Online Social Network Services: A Case Study on Twitter},
        proceedings={Context-Aware Systems and Applications. Second International Conference, ICCASA 2013, Phu Quoc Island, Vietnam, November 25-26, 2013, Revised Selected Papers},
        proceedings_a={ICCASA},
        year={2014},
        month={6},
        keywords={Words normalization Online social network services Twitter},
        doi={10.1007/978-3-319-05939-6_35}
    }
    
  • Dongjin Choi
    Jeongin Kim
    Pankoo Kim
    Year: 2014
    A Method for Normalizing Non-standard Words in Online Social Network Services: A Case Study on Twitter
    ICCASA
    Springer
    DOI: 10.1007/978-3-319-05939-6_35
Dongjin Choi1,*, Jeongin Kim1,*, Pankoo Kim1,*
  • 1: Chosun University
*Contact email: dongjin.choi84@gmail.com, jungingim@gmail.com, pkkim@chosun.ac.kr

Abstract

Due to the big developments of Smartphone devices and on-line social network services, people can share diverse information about what they have been experienced during a day with no constrain to time or location. This fact has changed entire previous online system. We simply insert a query to search engine or OSNSs by using Smartphone devices. Because of this effectiveness, text data in OSNSs is getting bigger including many noisy data especially non-standard words. People are likely to type a text in short format such as abbreviation, acronym, and more when they using Smartphone to send a message to their friends in order to save time and data usages. As a result of these reasons, non-standard words on the web is extremely increasing so it has to be normalize into standard words in order to enhance performance of Natural Language Processing. When we analyze plain text data to extract semantic meaning, this nosy data has been ignore even though it has valuable information. In order to overcome this problem, we address a method for normalizing non-standard words in OSNSs, particularly for Twitter text data. We analyzed more than fifty million tweets which was collected by Stanford University and normalized non-standard words into standard English words by using diverse coefficient method such as dice, jacard, ochiai, sorgenfrei, and more. We finally conclude this paper by comparing those coefficient methods with our proposed one.