Research Article
Fuzziness in Text Classification Using Different Similarity Metrics
@INPROCEEDINGS{10.1007/978-3-642-27317-9_26, author={M. Wajeed and T. Adilakshmi}, title={Fuzziness in Text Classification Using Different Similarity Metrics}, proceedings={Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III}, proceedings_a={CCSIT PART III}, year={2012}, month={11}, keywords={Text classification data Clusters soft-hard-mixed clusters eucledean chebyshev manhattan bray-curtis canberra similarity measures}, doi={10.1007/978-3-642-27317-9_26} }
- M. Wajeed
T. Adilakshmi
Year: 2012
Fuzziness in Text Classification Using Different Similarity Metrics
CCSIT PART III
Springer
DOI: 10.1007/978-3-642-27317-9_26
Abstract
We are living in the information era where vast amount of data is generated at the end of the day, which can also be in textual form. To cater the further needs and to make decisions effective we need to classify the generated data and store it in the classified repository, so that later it can efficiently be retrieved with minimum effort. The paper attempts to mix the concepts of supervised learning and unsupervised learning techniques, by forming clusters which could act as features so that feature reduction can be made possible. Clusters are formed based on the word patterns, soft, hard and mixed clustering is also considered in the processes of text classification. We employee different similarity measures like Euclidean, square Euclidean, Manhattan, chebyshev, bray-Curtis etc., in the processing of finding the category of the document. The results obtained were encouraging.