
Research Article
Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes
@INPROCEEDINGS{10.1007/978-3-031-59717-6_9, author={Syed Mohtashim Abbas Bokhari and Kriste Krstovski and Jennifer Withall and Rachel Lee and Patricia Dykes and Mai Tran and Kenrick Cato and Sarah Rossetti}, title={Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes}, proceedings={Pervasive Computing Technologies for Healthcare. 17th EAI International Conference, PervasiveHealth 2023, Malm\o{}, Sweden, November 27-29, 2023, Proceedings}, proceedings_a={PERVASIVEHEALTH}, year={2024}, month={6}, keywords={nursing documentation health informatics clinical notes nursing notes heuristics natural language processing information retrieval unigram analysis}, doi={10.1007/978-3-031-59717-6_9} }
- Syed Mohtashim Abbas Bokhari
Kriste Krstovski
Jennifer Withall
Rachel Lee
Patricia Dykes
Mai Tran
Kenrick Cato
Sarah Rossetti
Year: 2024
Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes
PERVASIVEHEALTH
Springer
DOI: 10.1007/978-3-031-59717-6_9
Abstract
Free text in nurses’ notes can play an important role in clinical decision-making; however, such information has not been explored to the fullest of its potential as it is hard to extract it from electronic health records (EHRs). Free text is a subset of the information recorded in nursing notes. Automated extraction of free text is challenging due to EHRs’ size and structural diversity. Understanding these structural and content-level differences is essential for the extraction. Free text is embedded in other relatively structured texts, which are difficult to detect automatically. Moreover, there is no information indicating whether a note is a free text. As a first step in automating the extraction process, we explore heuristic-based algorithms with the goal of establishing a baseline and developing an annotated dataset, which could then be used for further machine learning-based extraction algorithms for a more scalable solution. In this research, we analyze over 200,000 EHR notes and extract 40,000 free text notes from them. Furthermore, we use the unigram language model to analyze the differences between free and structured texts to better understand the free text content.