About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Computer Science and Education in Computer Science. 19th EAI International Conference, CSECS 2023, Boston, MA, USA, June 28–29, 2023, Proceedings

Research Article

Expository Clustering Visualizations: Keeping it Simple

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-44668-9_4,
        author={Greg Page},
        title={Expository Clustering Visualizations: Keeping it Simple},
        proceedings={Computer Science and Education in Computer Science. 19th EAI International Conference, CSECS 2023, Boston, MA, USA, June 28--29, 2023, Proceedings},
        proceedings_a={CSECS},
        year={2023},
        month={10},
        keywords={data mining k-means clustering computer science education},
        doi={10.1007/978-3-031-44668-9_4}
    }
    
  • Greg Page
    Year: 2023
    Expository Clustering Visualizations: Keeping it Simple
    CSECS
    Springer
    DOI: 10.1007/978-3-031-44668-9_4
Greg Page,*
    *Contact email: gpage@bu.edu

    Abstract

    In this paper, the authors present a very basic overview of k-means clustering, before using statistical summaries to demonstrate how such a model separates records in a dataset into distinct groups.

    The author then shows how simple visualizations can effectively “tell the story” behind a clustering model, to include the key distinctions that tend to differentiate one group from another.

    The author then explores Principal Component (PC) plots, a tool often misused by analysts seeking to convey information about the clusters identified by their models. Such plots are based not on original variables from the data, but upon linear combinations of those variables. While PC plots are colorful and impressive-looking, their meaning often eludes the students who use them in end-of-semester project presentations.

    PC plots serve some value as a diagnostic tool for kmeans modelers; however, these plots should not be used in an expository way by someone who wishes to convey the main findings of a clustering model. Instead, boxplots, scatterplots, barplots, and histograms can much more effectively convey the major takeaways for such a model.

    Keywords
    data mining k-means clustering computer science education
    Published
    2023-10-11
    Appears in
    SpringerLink
    http://dx.doi.org/10.1007/978-3-031-44668-9_4
    Copyright © 2023–2025 ICST
    EBSCOProQuestDBLPDOAJPortico
    EAI Logo

    About EAI

    • Who We Are
    • Leadership
    • Research Areas
    • Partners
    • Media Center

    Community

    • Membership
    • Conference
    • Recognition
    • Sponsor Us

    Publish with EAI

    • Publishing
    • Journals
    • Proceedings
    • Books
    • EUDL