About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III

Research Article

Fuzziness in Text Classification Using Different Similarity Metrics

Download(Requires a free EAI acccount)
284 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-642-27317-9_26,
        author={M. Wajeed and T. Adilakshmi},
        title={Fuzziness in Text Classification Using Different Similarity Metrics},
        proceedings={Advances in Computer Science and Information Technology. Computer Science and Information Technology. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part III},
        proceedings_a={CCSIT PART  III},
        year={2012},
        month={11},
        keywords={Text classification data Clusters soft-hard-mixed clusters eucledean chebyshev manhattan bray-curtis canberra similarity measures},
        doi={10.1007/978-3-642-27317-9_26}
    }
    
  • M. Wajeed
    T. Adilakshmi
    Year: 2012
    Fuzziness in Text Classification Using Different Similarity Metrics
    CCSIT PART III
    Springer
    DOI: 10.1007/978-3-642-27317-9_26
M. Wajeed1,*, T. Adilakshmi2,*
  • 1: Sreenidhi Institute of Science & Technology
  • 2: Vasavi College of Engineering
*Contact email: wajeed.mtech@gmail.com, t_adilakshmi@gmail.com

Abstract

We are living in the information era where vast amount of data is generated at the end of the day, which can also be in textual form. To cater the further needs and to make decisions effective we need to classify the generated data and store it in the classified repository, so that later it can efficiently be retrieved with minimum effort. The paper attempts to mix the concepts of supervised learning and unsupervised learning techniques, by forming clusters which could act as features so that feature reduction can be made possible. Clusters are formed based on the word patterns, soft, hard and mixed clustering is also considered in the processes of text classification. We employee different similarity measures like Euclidean, square Euclidean, Manhattan, chebyshev, bray-Curtis etc., in the processing of finding the category of the document. The results obtained were encouraging.

Keywords
Text classification data Clusters soft-hard-mixed clusters eucledean chebyshev manhattan bray-curtis canberra similarity measures
Published
2012-11-09
http://dx.doi.org/10.1007/978-3-642-27317-9_26
Copyright © 2012–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL