Fuzziness in Text Classification Using Different Similarity Metrics

M. Wajeed; T. Adilakshmi

Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III

Research Article

Fuzziness in Text Classification Using Different Similarity Metrics

Download

393 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-642-27317-9_26,
    author={M. Wajeed and T. Adilakshmi},
    title={Fuzziness in Text Classification Using Different Similarity Metrics},
    proceedings={Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III},
    proceedings_a={CCSIT PART  III},
    year={2012},
    month={11},
    keywords={Text classification data Clusters soft-hard-mixed clusters eucledean chebyshev manhattan bray-curtis canberra similarity measures},
    doi={10.1007/978-3-642-27317-9_26}
}

M. Wajeed
T. Adilakshmi
Year: 2012
Fuzziness in Text Classification Using Different Similarity Metrics
CCSIT PART III
Springer
DOI: 10.1007/978-3-642-27317-9_26

M. Wajeed¹^,*, T. Adilakshmi²^,*

1: Sreenidhi Institute of Science & Technology
2: Vasavi College of Engineering

*Contact email: wajeed.mtech@gmail.com, t_adilakshmi@gmail.com

Abstract

We are living in the information era where vast amount of data is generated at the end of the day, which can also be in textual form. To cater the further needs and to make decisions effective we need to classify the generated data and store it in the classified repository, so that later it can efficiently be retrieved with minimum effort. The paper attempts to mix the concepts of supervised learning and unsupervised learning techniques, by forming clusters which could act as features so that feature reduction can be made possible. Clusters are formed based on the word patterns, soft, hard and mixed clustering is also considered in the processes of text classification. We employee different similarity measures like Euclidean, square Euclidean, Manhattan, chebyshev, bray-Curtis etc., in the processing of finding the category of the document. The results obtained were encouraging.

Keywords: Text classification, data Clusters, soft-hard-mixed clusters, eucledean, chebyshev, manhattan, bray-curtis, canberra similarity measures

Published: 2012-11-09

: http://dx.doi.org/10.1007/978-3-642-27317-9_26

Fuzziness in Text Classification Using Different Similarity Metrics

Abstract

About EAI

Community

Publish with EAI