
Research Article
Expository Clustering Visualizations: Keeping it Simple
@INPROCEEDINGS{10.1007/978-3-031-44668-9_4, author={Greg Page}, title={Expository Clustering Visualizations: Keeping it Simple}, proceedings={Computer Science and Education in Computer Science. 19th EAI International Conference, CSECS 2023, Boston, MA, USA, June 28--29, 2023, Proceedings}, proceedings_a={CSECS}, year={2023}, month={10}, keywords={data mining k-means clustering computer science education}, doi={10.1007/978-3-031-44668-9_4} }
- Greg Page
Year: 2023
Expository Clustering Visualizations: Keeping it Simple
CSECS
Springer
DOI: 10.1007/978-3-031-44668-9_4
Abstract
In this paper, the authors present a very basic overview of k-means clustering, before using statistical summaries to demonstrate how such a model separates records in a dataset into distinct groups.
The author then shows how simple visualizations can effectively “tell the story” behind a clustering model, to include the key distinctions that tend to differentiate one group from another.
The author then explores Principal Component (PC) plots, a tool often misused by analysts seeking to convey information about the clusters identified by their models. Such plots are based not on original variables from the data, but upon linear combinations of those variables. While PC plots are colorful and impressive-looking, their meaning often eludes the students who use them in end-of-semester project presentations.
PC plots serve some value as a diagnostic tool for kmeans modelers; however, these plots should not be used in an expository way by someone who wishes to convey the main findings of a clustering model. Instead, boxplots, scatterplots, barplots, and histograms can much more effectively convey the major takeaways for such a model.