
Research Article
TextRank – Based Keyword Extraction for Constructing a Domain-Specific Dictionary
@INPROCEEDINGS{10.1007/978-3-031-48888-7_29, author={Sridevi Bonthu and Hema Sankar Sai Ganesh Babu Muddam and Koushik Varma Mudunuri and Abhinav Dayal and V. V. R. Maheswara Rao and Bharat Kumar Bolla}, title={TextRank -- Based Keyword Extraction for Constructing a Domain-Specific Dictionary}, proceedings={Cognitive Computing and Cyber Physical Systems. 4th EAI International Conference, IC4S 2023, Bhimavaram, Andhra Pradesh, India, August 4-6, 2023, Proceedings, Part I}, proceedings_a={IC4S}, year={2024}, month={1}, keywords={Extraction TextRank POS tagging Text mining domain-specific dictionary Natural Language Processing}, doi={10.1007/978-3-031-48888-7_29} }
- Sridevi Bonthu
Hema Sankar Sai Ganesh Babu Muddam
Koushik Varma Mudunuri
Abhinav Dayal
V. V. R. Maheswara Rao
Bharat Kumar Bolla
Year: 2024
TextRank – Based Keyword Extraction for Constructing a Domain-Specific Dictionary
IC4S
Springer
DOI: 10.1007/978-3-031-48888-7_29
Abstract
Extracting domain-related keywords from text documents is a crucial task in both Information Retrieval and Natural Language Processing (NLP). This paper presents an approach that combines the TextRank algorithm with various NLP techniques to effectively identify domain-specific keywords. Our method utilizes the power of unsupervised graph-based ranking algorithms and the semantic understanding of NLP models to extract key terms that are highly relevant to a specific domain. The work is carried out on an arXiv research abstract dataset. This work preprocesses the input text to capture linguistic features, extracts the keywords using TextRank and POS filtering approaches, extracts the definitions and finally evaluates the performance. The performance of the extracted keywords is done with the help of manually annotated labels. The proposed method has obtained 83% accuracy. The proposed approach is flexible and adaptable to different domains, as it can be trained on domain-specific data to further improve its performance.