About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
Collaborative Computing: Networking, Applications and Worksharing. 19th EAI International Conference, CollaborateCom 2023, Corfu Island, Greece, October 4-6, 2023, Proceedings, Part I

Research Article

CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search

Cite
BibTeX Plain Text
  • @INPROCEEDINGS{10.1007/978-3-031-54521-4_19,
        author={Qihong Song and Jianxun Liu and Haize Hu},
        title={CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search},
        proceedings={Collaborative Computing: Networking, Applications and Worksharing. 19th EAI International Conference, CollaborateCom 2023, Corfu Island, Greece, October 4-6, 2023, Proceedings, Part I},
        proceedings_a={COLLABORATECOM},
        year={2024},
        month={2},
        keywords={Code search Collaborative fusion representation Fine tuning Hard negative sample Data augmentation},
        doi={10.1007/978-3-031-54521-4_19}
    }
    
  • Qihong Song
    Jianxun Liu
    Haize Hu
    Year: 2024
    CUTE: A Collaborative Fusion Representation-Based Fine-Tuning and Retrieval Framework for Code Search
    COLLABORATECOM
    Springer
    DOI: 10.1007/978-3-031-54521-4_19
Qihong Song1, Jianxun Liu1,*, Haize Hu1
  • 1: School of Computer Science and Engineering
*Contact email: 904500672@qq.com

Abstract

Code search aims at searching semantically related code snippets from the large-scale database based on a given natural descriptive query. Fine-tuning pre-trained models for code search tasks has recently emerged as a new trend. However, most studies fine-tune models merely using metric learning, overlooking the beneficial effect of the collaborative relationship between code and query. In this paper, we introduce an effective fine-tuning and retrieval framework called CUTE. In the fine-tuning component, we propose a Collaborative Fusion Representation (CFR) consisting of three stages: pre-representation, collaborative representation, and residual fusion. CFR enhances the representation of code and query, considering token-level collaborative features between code and query. Furthermore, we apply augmentation techniques to generate vector-level hard negative samples for training, which further improves the ability of the pre-trained model to distinguish and represent features during fine-tuning. In the retrieval component, we introduce a two-stage retrieval architecture that includes pre-retrieval and refined ranking, significantly reducing time and computational resource consumption. We evaluate CUTE with three advanced pre-trained models on CodeSearchNet consisting of six programming languages. Extensive experiments demonstrate the fine-tuning effectiveness and retrieval efficiency of CUTE.

Keywords
Code search Collaborative fusion representation Fine tuning Hard negative sample Data augmentation
Published
2024-02-23
Appears in
SpringerLink
http://dx.doi.org/10.1007/978-3-031-54521-4_19
Copyright © 2023–2025 ICST
EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL