About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Computer Science and Education in Computer Science. 20th EAI International Conference, CSECS 2024, Sofia, Bulgaria, June 28–30, 2024, Proceedings

Research Article

Large Language Models for Identification of Medical Data in Unstructured Records

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-84312-9_8,
        author={Presian Petkov and Latchezar Tomov and Emanuil Markov},
        title={Large Language Models for Identification of Medical Data in Unstructured Records},
        proceedings={Computer Science and Education in Computer Science. 20th EAI International Conference, CSECS 2024, Sofia, Bulgaria, June 28--30, 2024, Proceedings},
        proceedings_a={CSECS},
        year={2025},
        month={3},
        keywords={Large Language Models Diabetes Healthcare Information extraction Smoking},
        doi={10.1007/978-3-031-84312-9_8}
    }
    
  • Presian Petkov
    Latchezar Tomov
    Emanuil Markov
    Year: 2025
    Large Language Models for Identification of Medical Data in Unstructured Records
    CSECS
    Springer
    DOI: 10.1007/978-3-031-84312-9_8
Presian Petkov1, Latchezar Tomov2,*, Emanuil Markov
  • 1: B.IT., New Bulgarian University
  • 2: Department of Informatics, New Bulgarian University
*Contact email: lptomov@nbu.bg

Abstract

In Bulgarian healthcare there are some unique challenges, due to the nature of the medical data, being unstructured. Some key predictors for health such as the smoking status are described in an unstructured way in medical records, which makes impossible simple retrieval with or without regular expressions, or manual check due to the large number of records available. Large Language Models are a viable alternative to automate the process of extraction of medical data from unstructured records due to their ability to process natural language, even some for which they lack enough or specific training, such as Bulgarian language. We develop a method and a procedure to test multiple LLMs for that purpose and obtain some promising results that show their capabilities. We analyze the results in the context of the price of the service and the computational costs.

Keywords
Large Language Models Diabetes Healthcare Information extraction Smoking
Published
2025-03-14
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-84312-9_8
Copyright © 2024–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL