
Research Article
Enhancing the Prediction of IL-4 Inducing Peptides Using Stacking Ensemble Model
@ARTICLE{10.4108/airo.9867, author={Rajib Mia and Tawfiqul Hasan and Abu Kowshir Bitto and Mohammad Mahadi Hassan and Mohammed Shamsul Alam and Abdul Kadar Muhammad Masum }, title={Enhancing the Prediction of IL-4 Inducing Peptides Using Stacking Ensemble Model}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={4}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2025}, month={10}, keywords={Immunoinformatic, Peptides, Interleukin-4, Artificial Intelligence, Machine Learning , Stacking Ensemble}, doi={10.4108/airo.9867} }- Rajib Mia
Tawfiqul Hasan
Abu Kowshir Bitto
Mohammad Mahadi Hassan
Mohammed Shamsul Alam
Abdul Kadar Muhammad Masum
Year: 2025
Enhancing the Prediction of IL-4 Inducing Peptides Using Stacking Ensemble Model
AIRO
EAI
DOI: 10.4108/airo.9867
Abstract
Interleukin-4 (IL-4) plays a critical role in immune regulation and inflammation suppression, and therefore precise prediction is important in immunotherapy and vaccine design. In this work, we present an innovative stacking ensemble-based predictive model for IL-4-inducing peptide discovery. The method combines the group of feature extraction techniques, i.e., Amino Acid Composition (AAC), Amphiphilic Pseudo Amino Acid Composition (APAAC), and their combinations, and their pruning using SHAP (SHapley Additive exPlanations) with only the most relevant features being retained. To solve the class imbalance problem inherent in the peptide data, the ADASYN (Adaptive Synthetic Sampling) algorithm was applied for synthetic oversampling. We applied eight machine learning classifiers: Logistic Regression, Random Forest, Support Vector Classifier, Decision Tree, K-Nearest Neighbors, XGBoost, LightGBM, and a stacking ensemble model, enabling the strong prediction on both imbalanced and balanced datasets. Our evaluation demonstrates the stacking model's better performance on the imbalanced and balanced dataset. Surprisingly, with combined characteristics, the stacking model over the independent test set yielded accuracy of 89.97% and Matthew's Correlation Coefficient (MCC) as 0.79. Accurate comparisons of performance over AAC and APAAC feature spaces indicate that the stacking model performs better than other classifiers in all instances, albeit more so under balanced scenarios, referring to data rebalancing requirements. This research not only highlights the precision of stacking ensembles in peptide classification tasks but also urges the integration of interpretable feature selection and data balancing in future immunoinformatic pipelines.
Copyright © Rajib Mia et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.


