Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes

Syed Mohtashim Abbas Bokhari; Kriste Krstovski; Jennifer Withall; Rachel Lee; Patricia Dykes; Mai Tran; Kenrick Cato; Sarah Rossetti

Pervasive Computing Technologies for Healthcare. 17th EAI International Conference, PervasiveHealth 2023, Malmö, Sweden, November 27-29, 2023, Proceedings

Research Article

Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-59717-6_9,
    author={Syed Mohtashim Abbas Bokhari and Kriste Krstovski and Jennifer Withall and Rachel Lee and Patricia Dykes and Mai Tran and Kenrick Cato and Sarah Rossetti},
    title={Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes},
    proceedings={Pervasive Computing Technologies for Healthcare. 17th EAI International Conference, PervasiveHealth 2023, Malm\o{}, Sweden, November 27-29, 2023, Proceedings},
    proceedings_a={PERVASIVEHEALTH},
    year={2024},
    month={6},
    keywords={nursing documentation health informatics clinical notes nursing notes heuristics natural language processing information retrieval unigram analysis},
    doi={10.1007/978-3-031-59717-6_9}
}

Syed Mohtashim Abbas Bokhari
Kriste Krstovski
Jennifer Withall
Rachel Lee
Patricia Dykes
Mai Tran
Kenrick Cato
Sarah Rossetti
Year: 2024
Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes
PERVASIVEHEALTH
Springer
DOI: 10.1007/978-3-031-59717-6_9

Syed Mohtashim Abbas Bokhari¹^,*, Kriste Krstovski², Jennifer Withall¹, Rachel Lee³, Patricia Dykes⁴, Mai Tran¹, Kenrick Cato⁵, Sarah Rossetti¹

1: Department of Biomedical Informatics, Columbia University
2: Data Science Institute, Columbia University
3: School of Nursing, Columbia University
4: Harvard Medical School, Brigham and Women’s Hospital
5: University of Pennsylvania

*Contact email: mohtashim_abbas@yahoo.com

Abstract

Free text in nurses’ notes can play an important role in clinical decision-making; however, such information has not been explored to the fullest of its potential as it is hard to extract it from electronic health records (EHRs). Free text is a subset of the information recorded in nursing notes. Automated extraction of free text is challenging due to EHRs’ size and structural diversity. Understanding these structural and content-level differences is essential for the extraction. Free text is embedded in other relatively structured texts, which are difficult to detect automatically. Moreover, there is no information indicating whether a note is a free text. As a first step in automating the extraction process, we explore heuristic-based algorithms with the goal of establishing a baseline and developing an annotated dataset, which could then be used for further machine learning-based extraction algorithms for a more scalable solution. In this research, we analyze over 200,000 EHR notes and extract 40,000 free text notes from them. Furthermore, we use the unigram language model to analyze the differences between free and structured texts to better understand the free text content.

Keywords: nursing documentation, health informatics, clinical notes, nursing notes, heuristics, natural language processing, information retrieval, unigram analysis

Published: 2024-06-04
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-59717-6_9

Heuristic-Based Extraction and Unigram Analysis of Nursing Free Text Data Residing in Large EHR Clinical Notes

Abstract

About EAI

Community

Publish with EAI