1st International ICST Conference on Scalable Information Systems

Research Article

PENS: an algorithm for density-based clustering in peer-to-peer systems

  • @INPROCEEDINGS{10.1145/1146847.1146886,
        author={Mei   Li and Wang-Chien  Lee and Anand Sivasubramaniam and Guanling Lee},
        title={PENS: an algorithm for density-based clustering in peer-to-peer systems},
        proceedings={1st International ICST Conference on Scalable Information Systems},
        publisher={ACM},
        proceedings_a={INFOSCALE},
        year={2006},
        month={6},
        keywords={},
        doi={10.1145/1146847.1146886}
    }
    
  • Mei Li
    Wang-Chien Lee
    Anand Sivasubramaniam
    Guanling Lee
    Year: 2006
    PENS: an algorithm for density-based clustering in peer-to-peer systems
    INFOSCALE
    ACM
    DOI: 10.1145/1146847.1146886
Mei Li1,*, Wang-Chien Lee1,*, Anand Sivasubramaniam1,*, Guanling Lee2,*
  • 1: Department of Computer Sciences and Engineering, Pennsylvania State University, University Park, Pennsylvania, 16801, USA
  • 2: Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien, Taiwan, 973, R.O.C
*Contact email: meli@cse.psu.edu, wlee@cse.psu.edu, anand@cse.psu.edu, guanling@mail.ndhu.edu.tw

Abstract

Huge amounts of data are available in large-scale networks of autonomous data sources dispersed over a wide area. Data mining is an essential technology for obtaining hidden and valuable knowledge from these networked data sources. In this paper, we investigate clustering, one of the most important data mining tasks, in one of such networked computing environments, i.e., peer-to-peer (P2P) systems. The lack of a central control and the sheer large size of P2P systems make the existing clustering techniques not applicable here. We propose a fully distributed clustering algorithm, called Peer dENsity-based cluStering (PENS), which overcomes the challenge raised in performing clustering in peer-to-peer environments, i.e., cluster assembly. The main idea of PENS is hierarchical cluster assembly, which enables peers to collaborate in forming a global clustering model without requiring a central control or message flooding. The complexity analysis of the algorithm demonstrates that PENS can discover clusters and noise efficiently in P2P systems.