Large Language Models for Identification of Medical Data in Unstructured Records

Presian Petkov; Latchezar Tomov; Emanuil Markov

Computer Science and Education in Computer Science. 20th EAI International Conference, CSECS 2024, Sofia, Bulgaria, June 28–30, 2024, Proceedings

Research Article

Large Language Models for Identification of Medical Data in Unstructured Records

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-84312-9_8,
    author={Presian Petkov and Latchezar Tomov and Emanuil Markov},
    title={Large Language Models for Identification of Medical Data in Unstructured Records},
    proceedings={Computer Science and Education in Computer Science. 20th EAI International Conference, CSECS 2024, Sofia, Bulgaria, June 28--30, 2024, Proceedings},
    proceedings_a={CSECS},
    year={2025},
    month={3},
    keywords={Large Language Models Diabetes Healthcare Information extraction Smoking},
    doi={10.1007/978-3-031-84312-9_8}
}

Presian Petkov
Latchezar Tomov
Emanuil Markov
Year: 2025
Large Language Models for Identification of Medical Data in Unstructured Records
CSECS
Springer
DOI: 10.1007/978-3-031-84312-9_8

Presian Petkov¹, Latchezar Tomov²^,*, Emanuil Markov

1: B.IT., New Bulgarian University
2: Department of Informatics, New Bulgarian University

*Contact email: lptomov@nbu.bg

Abstract

In Bulgarian healthcare there are some unique challenges, due to the nature of the medical data, being unstructured. Some key predictors for health such as the smoking status are described in an unstructured way in medical records, which makes impossible simple retrieval with or without regular expressions, or manual check due to the large number of records available. Large Language Models are a viable alternative to automate the process of extraction of medical data from unstructured records due to their ability to process natural language, even some for which they lack enough or specific training, such as Bulgarian language. We develop a method and a procedure to test multiple LLMs for that purpose and obtain some promising results that show their capabilities. We analyze the results in the context of the price of the service and the computational costs.

Keywords: Large Language Models, Diabetes, Healthcare, Information extraction, Smoking

Published: 2025-03-14
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-84312-9_8

Large Language Models for Identification of Medical Data in Unstructured Records

Abstract

About EAI

Community

Publish with EAI