About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
sis 22(1): e5

Research Article

Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC

Download591 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/eai.27-5-2022.174084,
        author={Muhammed Maruf \O{}zt\'{y}rk},
        title={Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC},
        journal={EAI Endorsed Transactions on Scalable Information Systems},
        volume={10},
        number={1},
        publisher={EAI},
        journal_a={SIS},
        year={2022},
        month={5},
        keywords={Multi-label classification, hyperparameter optimization, programming language prediction},
        doi={10.4108/eai.27-5-2022.174084}
    }
    
  • Muhammed Maruf Öztürk
    Year: 2022
    Developing a hyperparameter optimization method for classification of code snippets and questions of stack overflow: HyperSCC
    SIS
    EAI
    DOI: 10.4108/eai.27-5-2022.174084
Muhammed Maruf Öztürk1,*
  • 1: Department of Computer Engineering, Suleyman Demirel University, West Campus, Isparta, 32040, Turkey
*Contact email: muhammedozturk@sdu.edu.tr

Abstract

Although there exist various machine learning and text mining techniques to identify the programming language of complete code files, multi-label code snippet prediction was not considered by the research community. This work aims at devising a tuner for multi-label programming language prediction of stack overflow posts. To that end, a Hyper Source Code Classifier (HyperSCC) is devised along with rule-based automatic labeling by considering the bottlenecks of multi-label classification. The proposed method is evaluated on seven multi-label predictors to conduct an extensive analysis. The method is further compared with the three competitive alternatives in terms of one-label programming language prediction. HyperSCC outperformed the other methods in terms of the F1 score. Preprocessing results in a high reduction (50%) of training time when ensemble multi-label predictors are employed. In one-label programming language prediction, Gradient Boosting Machine (gbm) yields the highest accuracy (0.99) in predicting R posts that have a lot of distinctive words determining labels. The findings support the hypothesis that multi-label predictors can be strengthened with sophisticated feature selection and labeling approaches.

Keywords
Multi-label classification, hyperparameter optimization, programming language prediction
Received
2022-03-21
Accepted
2022-05-26
Published
2022-05-27
Publisher
EAI
http://dx.doi.org/10.4108/eai.27-5-2022.174084

Copyright © 2022 Muhammed Maruf Öztürk, licensed to EAI. This is an open access article distributed under the terms of the Creative Commons Attribution license, which permits unlimited use, distribution and reproduction in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL