Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections

Mathias Coeckelbergs; Seth Van Hooland

Data and Information in Online Environments. First EAI International Conference, DIONE 2020, Florianópolis, Brazil, March 19-20, 2020, Proceedings

Research Article

Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections

Download

33 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-030-50072-6_4,
    author={Mathias Coeckelbergs and Seth Van Hooland},
    title={Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections},
    proceedings={Data and Information in Online Environments. First EAI International Conference, DIONE 2020, Florian\^{o}polis, Brazil, March 19-20, 2020, Proceedings},
    proceedings_a={DIONE},
    year={2020},
    month={6},
    keywords={Topic modeling Word embeddings Document classification Information retrieval},
    doi={10.1007/978-3-030-50072-6_4}
}

Mathias Coeckelbergs
Seth Van Hooland
Year: 2020
Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections
DIONE
Springer
DOI: 10.1007/978-3-030-50072-6_4

Mathias Coeckelbergs^,*, Seth Van Hooland

*Contact email: mcoeckel@ulb.ac.be

Abstract

Within the field of Digital Humanities, unsupervised machine learning techniques such as topic modeling have gained a lot of attention over the last years to explore vast volumes of non-structured textual data. Even if this technique is useful to capture recurring themes across document sets which have no metadata, the interpretation of topics has been consistently highlighted in the literature as problematic. This paper proposes a novel method based on Word Embeddings to facilitate the interpretation of terms which constituted a topic, allowing to discern different concepts automatically within a topic. In order to demonstrate this method, the paper uses the “Cabinet Papers” held and digitised by the The National Archives (TNA) of the United Kingdom (UK). After a discussion of our results, based on coherence measures, we provide details of how we can linguistically interpret these results.

Keywords: Topic modeling, Word embeddings, Document classification, Information retrieval

Published: 2020-06-16
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-030-50072-6_4

Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections

Abstract

About EAI

Community

Publish with EAI