CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Qihong Song; Jianxun Liu; Haize Hu

Collaborative Computing: Networking, Applications and Worksharing. 19th EAI International Conference, CollaborateCom 2023, Corfu Island, Greece, October 4-6, 2023, Proceedings, Part I

Research Article

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.1007/978-3-031-54521-4_19,
    author={Qihong Song and Jianxun Liu and Haize Hu},
    title={CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search},
    proceedings={Collaborative Computing: Networking, Applications and Worksharing. 19th EAI International Conference, CollaborateCom 2023, Corfu Island, Greece, October 4-6, 2023, Proceedings, Part I},
    proceedings_a={COLLABORATECOM},
    year={2024},
    month={2},
    keywords={Code search Collaborative fusion representation Fine tuning Hard negative sample Data augmentation},
    doi={10.1007/978-3-031-54521-4_19}
}

Qihong Song
Jianxun Liu
Haize Hu
Year: 2024
CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search
COLLABORATECOM
Springer
DOI: 10.1007/978-3-031-54521-4_19

Qihong Song¹, Jianxun Liu¹^,*, Haize Hu¹

1: School of Computer Science and Engineering

*Contact email: 904500672@qq.com

Abstract

Code search aims at searching semantically related code snippets from the large-scale database based on a given natural descriptive query. Fine-tuning pre-trained models for code search tasks has recently emerged as a new trend. However, most studies fine-tune models merely using metric learning, overlooking the beneficial effect of the collaborative relationship between code and query. In this paper, we introduce an effective fine-tuning and retrieval framework called CUTE. In the fine-tuning component, we propose a Collaborative Fusion Representation (CFR) consisting of three stages: pre-representation, collaborative representation, and residual fusion. CFR enhances the representation of code and query, considering token-level collaborative features between code and query. Furthermore, we apply augmentation techniques to generate vector-level hard negative samples for training, which further improves the ability of the pre-trained model to distinguish and represent features during fine-tuning. In the retrieval component, we introduce a two-stage retrieval architecture that includes pre-retrieval and refined ranking, significantly reducing time and computational resource consumption. We evaluate CUTE with three advanced pre-trained models on CodeSearchNet consisting of six programming languages. Extensive experiments demonstrate the fine-tuning effectiveness and retrieval efficiency of CUTE.

Keywords: Code search Collaborative fusion representation Fine tuning Hard negative sample Data augmentation

Published: 2024-02-23
Appears in: SpringerLink

: http://dx.doi.org/10.1007/978-3-031-54521-4_19

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Abstract

About EAI

Community

Publish with EAI