Proceedings of the 13th EAI International Conference on Mobile Multimedia Communications, Mobimedia 2020, 27-28 August 2020, Cyberspace

Research Article

A Clustering Analysis Method Based on Wilcoxon-Mann-Whitney Testing

Download236 downloads
  • @INPROCEEDINGS{10.4108/eai.27-8-2020.2296732,
        author={Yuan  Cheng and Weinan  Jia and Ronghua  Chi},
        title={A Clustering Analysis Method Based on Wilcoxon-Mann-Whitney Testing},
        proceedings={Proceedings of the 13th EAI International Conference on Mobile Multimedia Communications, Mobimedia 2020, 27-28 August 2020, Cyberspace},
        publisher={EAI},
        proceedings_a={MOBIMEDIA},
        year={2020},
        month={11},
        keywords={clustering analysis distance measurement nonparametric statistics wilcoxon-mann-whitney rank sum test},
        doi={10.4108/eai.27-8-2020.2296732}
    }
    
  • Yuan Cheng
    Weinan Jia
    Ronghua Chi
    Year: 2020
    A Clustering Analysis Method Based on Wilcoxon-Mann-Whitney Testing
    MOBIMEDIA
    EAI
    DOI: 10.4108/eai.27-8-2020.2296732
Yuan Cheng1,*, Weinan Jia1, Ronghua Chi2
  • 1: Harbin University of Science and Technology
  • 2: Heilongjiang University of Science and Technology
*Contact email: changuang7@sina.com

Abstract

As the core step of clustering analysis, the results of distance measurements will influence the clustering accuracy. The existing measurements are mostly based on the information about cluster features. However, the cluster features may be not sufficient enough and would result in losing data information about clusters containing a number of objects. To improve the measurement accuracy, we make full use of the distribution characteristics of objects in clusters, so we use the descriptive statistics and the Wilcoxon-Mann-Whitney rank sum test in nonparametric statistics to measure distances during clustering. Furthermore, a two-stage clustering is proposed to improve the performance of clustering analysis, from the aspects ofavoiding assuming the number of clusterspreliminarily, discovering clusters of arbitrary shapes andimproving clustering accuracy. The experiments on multiple datasets compared with other clustering algorithms illustrate the accuracy and efficiency of the proposed clustering algorithm.