Advances in Computer Science and Information Technology. Networks and Communications. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part I

Research Article

Adaptive K-Means Clustering to Handle Heterogeneous Data Using Basic Rough Set Theory

Download
234 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-27299-8_21,
        author={B. Tripathy and Adhir Ghosh and G. Panda},
        title={Adaptive K-Means Clustering to Handle Heterogeneous Data Using Basic Rough Set Theory},
        proceedings={Advances in Computer Science and Information Technology. Networks and Communications. Second International Conference, CCSIT 2012, Bangalore, India, January 2-4, 2012. Proceedings, Part I},
        proceedings_a={CCSIT PART I},
        year={2012},
        month={11},
        keywords={Classification Cluster Crisp boundaries Heterogeneous data Uncertainty},
        doi={10.1007/978-3-642-27299-8_21}
    }
    
  • B. Tripathy
    Adhir Ghosh
    G. Panda
    Year: 2012
    Adaptive K-Means Clustering to Handle Heterogeneous Data Using Basic Rough Set Theory
    CCSIT PART I
    Springer
    DOI: 10.1007/978-3-642-27299-8_21
B. Tripathy1,*, Adhir Ghosh1,*, G. Panda2,*
  • 1: VIT University
  • 2: MITS
*Contact email: tripathybk@rediffmail.com, adhir39@rediffmail.com, gkpmail@sify.com

Abstract

Several cluster analysis techniques have been developed till the present to group objects having similar property or similar characteristics and K-means clustering is one of the most popular statistical clustering techniques proposed by Macqueen [12] in 1967. But this algorithm is unable to handle the categorical data and unable to handle uncertainty as well. But after proposing the rough set theory by Pawlak [15], we have an alternative way of representing sets whose exact boundary cannot be described due to incomplete information. As rough set has been widely used for knowledge representation, hence it can also be applied in classification and very helpful in clustering too. In real life data mining applications we do not have the crisp boundaries for clusters. So, in 2007 and 2009 Parmar et al [14] and Tripathy et al [16] proposed two algorithms MMR and MMeR using rough set theory but these two algorithms have the stability problem due to multiple runs and higher time complexity. In this paper we are proposing a new approach of k-means algorithm using rough set which can handle heterogeneous data and uncertainty as well.