About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Data and Information in Online Environments. First EAI International Conference, DIONE 2020, Florianópolis, Brazil, March 19-20, 2020, Proceedings

Research Article

Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections

Download(Requires a free EAI acccount)
2 downloads
Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-030-50072-6_4,
        author={Mathias Coeckelbergs and Seth Van Hooland},
        title={Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections},
        proceedings={Data and Information in Online Environments. First EAI International Conference, DIONE 2020, Florian\^{o}polis, Brazil, March 19-20, 2020, Proceedings},
        proceedings_a={DIONE},
        year={2020},
        month={6},
        keywords={Topic modeling Word embeddings Document classification Information retrieval},
        doi={10.1007/978-3-030-50072-6_4}
    }
    
  • Mathias Coeckelbergs
    Seth Van Hooland
    Year: 2020
    Concepts in Topics. Using Word Embeddings to Leverage the Outcomes of Topic Modeling for the Exploration of Digitized Archival Collections
    DIONE
    Springer
    DOI: 10.1007/978-3-030-50072-6_4
Mathias Coeckelbergs,*, Seth Van Hooland
    *Contact email: mcoeckel@ulb.ac.be

    Abstract

    Within the field of Digital Humanities, unsupervised machine learning techniques such as topic modeling have gained a lot of attention over the last years to explore vast volumes of non-structured textual data. Even if this technique is useful to capture recurring themes across document sets which have no metadata, the interpretation of topics has been consistently highlighted in the literature as problematic. This paper proposes a novel method based on Word Embeddings to facilitate the interpretation of terms which constituted a topic, allowing to discern different concepts automatically within a topic. In order to demonstrate this method, the paper uses the “Cabinet Papers” held and digitised by the The National Archives (TNA) of the United Kingdom (UK). After a discussion of our results, based on coherence measures, we provide details of how we can linguistically interpret these results.

    Keywords
    Topic modeling Word embeddings Document classification Information retrieval
    Published
    2020-06-16
    Appears in
    SpringerLink
    http://dx.doi.org/10.1007/978-3-030-50072-6_4
    Copyright © 2020–2025 ICST
    EBSCOProQuestDBLPDOAJPortico
    EAI Logo

    About EAI

    • Who We Are
    • Leadership
    • Research Areas
    • Partners
    • Media Center

    Community

    • Membership
    • Conference
    • Recognition
    • Sponsor Us

    Publish with EAI

    • Publishing
    • Journals
    • Proceedings
    • Books
    • EUDL