
Research Article
Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes
- @INPROCEEDINGS{10.1007/978-3-031-59717-6_9, author={Syed Mohtashim Abbas Bokhari and Kriste Krstovski and Jennifer Withall and Rachel Lee and Patricia Dykes and Mai Tran and Kenrick Cato and Sarah Rossetti}, title={Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes}, proceedings={Pervasive Computing Technologies for Healthcare. 17th EAI International Conference, PervasiveHealth 2023, Malm\o{}, Sweden, November 27-29, 2023, Proceedings}, proceedings_a={PERVASIVEHEALTH}, year={2024}, month={6}, keywords={nursing documentation health informatics clinical notes nursing notes heuristics natural language processing information retrieval unigram analysis}, doi={10.1007/978-3-031-59717-6_9} }
- Syed Mohtashim Abbas Bokhari
 Kriste Krstovski
 Jennifer Withall
 Rachel Lee
 Patricia Dykes
 Mai Tran
 Kenrick Cato
 Sarah Rossetti
 Year: 2024
 Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes
 PERVASIVEHEALTH
 Springer
 DOI: 10.1007/978-3-031-59717-6_9
Abstract
Free text in nurses’ notes can play an important role in clinical decision-making; however, such information has not been explored to the fullest of its potential as it is hard to extract it from electronic health records (EHRs). Free text is a subset of the information recorded in nursing notes. Automated extraction of free text is challenging due to EHRs’ size and structural diversity. Understanding these structural and content-level differences is essential for the extraction. Free text is embedded in other relatively structured texts, which are difficult to detect automatically. Moreover, there is no information indicating whether a note is a free text. As a first step in automating the extraction process, we explore heuristic-based algorithms with the goal of establishing a baseline and developing an annotated dataset, which could then be used for further machine learning-based extraction algorithms for a more scalable solution. In this research, we analyze over 200,000 EHR notes and extract 40,000 free text notes from them. Furthermore, we use the unigram language model to analyze the differences between free and structured texts to better understand the free text content.


