Big Data Technologies and Applications. 7th International Conference, BDTA 2016, Seoul, South Korea, November 17–18, 2016, Proceedings

Research Article

Correcting Misspelled Words in Twitter Text

Download
1395 downloads
  • @INPROCEEDINGS{10.1007/978-3-319-58967-1_10,
        author={Jeongin Kim and Eunji Lee and Taekeun Hong and Pankoo Kim},
        title={Correcting Misspelled Words in Twitter Text},
        proceedings={Big Data Technologies and Applications. 7th International Conference, BDTA  2016, Seoul, South Korea, November 17--18, 2016, Proceedings},
        proceedings_a={BDTA},
        year={2017},
        month={6},
        keywords={Twitter text Misspelled word Correcting misspelled words Character n-gram Word n-gram},
        doi={10.1007/978-3-319-58967-1_10}
    }
    
  • Jeongin Kim
    Eunji Lee
    Taekeun Hong
    Pankoo Kim
    Year: 2017
    Correcting Misspelled Words in Twitter Text
    BDTA
    Springer
    DOI: 10.1007/978-3-319-58967-1_10
Jeongin Kim1,*, Eunji Lee1,*, Taekeun Hong1,*, Pankoo Kim1,*
  • 1: Chosun University
*Contact email: jungingim@gmail.com, eunbesu@gmail.com, goodfax2000@naver.com, pkkim@chosun.ac.kr

Abstract

The SNS became popularized by computer, mobile devices, and tablets that are accessible to the Internet. Among SNS, Twitter posts the words of short texts and, it shares information. Twitter texts are the optimal data to extract new information, but as it may contain the information within the limited number of words, there are various limitations. To improve accuracy of extracting information within Twitter texts, the process of calibrating misspelled words shall be taken in advance. In conventional studies to correct the misspelled words of Twitter texts, the relationship between misspelled words and correcting words was resolved by concerning the dependency of co-occurrence words with misspelled words within sentences and morphophonemic similarity, but since the frequency of co-occurrence words of misspelled words is not concerned, it has not resolved to correct misspelled words completely. In this paper, to correct misspelled words in Twitter texts, the use of the character n-gram method concerning spelling information and the word n-gram method concerning frequency of co-occurrence words are to be proposed.