Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III

Research Article

Fuzziness in Text Classification Using Different Similarity Metrics

Download
212 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-27317-9_26,
        author={M. Wajeed and T. Adilakshmi},
        title={Fuzziness in Text Classification Using Different Similarity Metrics},
        proceedings={Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III},
        proceedings_a={CCSIT PART  III},
        year={2012},
        month={11},
        keywords={Text classification data Clusters soft-hard-mixed clusters eucledean chebyshev manhattan bray-curtis canberra similarity measures},
        doi={10.1007/978-3-642-27317-9_26}
    }
    
  • M. Wajeed
    T. Adilakshmi
    Year: 2012
    Fuzziness in Text Classification Using Different Similarity Metrics
    CCSIT PART III
    Springer
    DOI: 10.1007/978-3-642-27317-9_26
M. Wajeed1,*, T. Adilakshmi2,*
  • 1: Sreenidhi Institute of Science & Technology
  • 2: Vasavi College of Engineering
*Contact email: wajeed.mtech@gmail.com, t_adilakshmi@gmail.com

Abstract

We are living in the information era where vast amount of data is generated at the end of the day, which can also be in textual form. To cater the further needs and to make decisions effective we need to classify the generated data and store it in the classified repository, so that later it can efficiently be retrieved with minimum effort. The paper attempts to mix the concepts of supervised learning and unsupervised learning techniques, by forming clusters which could act as features so that feature reduction can be made possible. Clusters are formed based on the word patterns, soft, hard and mixed clustering is also considered in the processes of text classification. We employee different similarity measures like Euclidean, square Euclidean, Manhattan, chebyshev, bray-Curtis etc., in the processing of finding the category of the document. The results obtained were encouraging.