
Research Article
JARAD: An Approach for Java API Mention Recognition and Disambiguation in Stack Overflow
@INPROCEEDINGS{10.1007/978-3-031-54521-4_15, author={Qingmi Liang and Yi Jin and Qi Xie and Li Kuang and Yu Sheng}, title={JARAD: An Approach for Java API Mention Recognition and Disambiguation in Stack Overflow}, proceedings={Collaborative Computing: Networking, Applications and Worksharing. 19th EAI International Conference, CollaborateCom 2023, Corfu Island, Greece, October 4-6, 2023, Proceedings, Part I}, proceedings_a={COLLABORATECOM}, year={2024}, month={2}, keywords={Java API Mention Recognition API Disambiguation Research Analysis Dataset Stack Overflow}, doi={10.1007/978-3-031-54521-4_15} }
- Qingmi Liang
Yi Jin
Qi Xie
Li Kuang
Yu Sheng
Year: 2024
JARAD: An Approach for Java API Mention Recognition and Disambiguation in Stack Overflow
COLLABORATECOM
Springer
DOI: 10.1007/978-3-031-54521-4_15
Abstract
Invoking APIs is a common way to improve the efficiency of software development. Developers often discuss various problems encountered or share the experience of using the API in communities, like Stack Overflow and GitHub. To avoid the duplicate discussion of issues and support downstream tasks such as API recommendation and API Mining, it is necessary to recognize APIs mentioned in these communities and link them to the fully qualified name. This work is often referred to as the task of API mention recognition and disambiguation in informal texts, which is the main focus of our paper. We start from Java posts in Stack Overflow and analyze the proportion of the posts that involve discussion on API (API Post for short), with short names or fully qualified names, and the characteristics of API Post. We also extract the APIs associated with more than 30,000 posts in Stack Overflow, and automatically establish(<post, APIs>)pairs to construct the dataset JAPD. Finally, we propose a novel approach JARAD to infer the associated APIs in a post. In our approach, we first use BiLSTM and CRF to fuse context information in text and code snippets to obtain a set of associated API candidates. The candidate API is then scored by the frequency of the API type appearing in the post to infer API’s fully qualified name. Our evaluation experiments demonstrate that JARAD achieves 71.58%, 76.84% and 74.12% on Precision, Recall and F1 respectively.