
Research Article
Large Language Models for Identification of Medical Data in Unstructured Records
@INPROCEEDINGS{10.1007/978-3-031-84312-9_8, author={Presian Petkov and Latchezar Tomov and Emanuil Markov}, title={Large Language Models for Identification of Medical Data in Unstructured Records}, proceedings={Computer Science and Education in Computer Science. 20th EAI International Conference, CSECS 2024, Sofia, Bulgaria, June 28--30, 2024, Proceedings}, proceedings_a={CSECS}, year={2025}, month={3}, keywords={Large Language Models Diabetes Healthcare Information extraction Smoking}, doi={10.1007/978-3-031-84312-9_8} }
- Presian Petkov
Latchezar Tomov
Emanuil Markov
Year: 2025
Large Language Models for Identification of Medical Data in Unstructured Records
CSECS
Springer
DOI: 10.1007/978-3-031-84312-9_8
Abstract
In Bulgarian healthcare there are some unique challenges, due to the nature of the medical data, being unstructured. Some key predictors for health such as the smoking status are described in an unstructured way in medical records, which makes impossible simple retrieval with or without regular expressions, or manual check due to the large number of records available. Large Language Models are a viable alternative to automate the process of extraction of medical data from unstructured records due to their ability to process natural language, even some for which they lack enough or specific training, such as Bulgarian language. We develop a method and a procedure to test multiple LLMs for that purpose and obtain some promising results that show their capabilities. We analyze the results in the context of the price of the service and the computational costs.