
Research Article
Multi Cancer Classification using Efficientnet-B3 Feature Extraction and UMAP-Optimized Traditional Machine Learning Models
@INPROCEEDINGS{10.4108/eai.28-4-2025.2357975, author={Mahendra Naga Venkat Vangara and Chaitanya Krishna Gogineni and Koteswara Rao Athota and Veera Venkata Abhinav Katika and Jawad Ahmad Dar}, title={Multi Cancer Classification using Efficientnet-B3 Feature Extraction and UMAP-Optimized Traditional Machine Learning Models}, proceedings={Proceedings of the 4th International Conference on Information Technology, Civil Innovation, Science, and Management, ICITSM 2025, 28-29 April 2025, Tiruchengode, Tamil Nadu, India, Part II}, publisher={EAI}, proceedings_a={ICITSM PART II}, year={2025}, month={10}, keywords={cancer classification deep learning convolutional neural networks transfer learning multi-cancer classification histopathological images machine learning classifiers support vector machines (svm) random forest logistic regression k-nearest neighbors (knn) xgboost efficientnetb3 umap dimensionality reduction feature extraction}, doi={10.4108/eai.28-4-2025.2357975} }
- Mahendra Naga Venkat Vangara
Chaitanya Krishna Gogineni
Koteswara Rao Athota
Veera Venkata Abhinav Katika
Jawad Ahmad Dar
Year: 2025
Multi Cancer Classification using Efficientnet-B3 Feature Extraction and UMAP-Optimized Traditional Machine Learning Models
ICITSM PART II
EAI
DOI: 10.4108/eai.28-4-2025.2357975
Abstract
Our work introduces the creation and testing of a multi-cancer classification system for histopathological images, with specific attention to the unique classification of Cervical Cancer, Acute Lymphoblastic Leukemia (ALL), Brain Cancer and Lung and Colon Cancer (handled as one group). Each of the cancer groups, with several stages or subclasses, was processed and modeled separately. The steps followed loading and pre-processing respective image datasets of each type of cancer, which was followed by feature extraction from using a pre-trained EfficientNetB3 model. To deal with dimensionality, UMAP (Uniform Manifold Approximation and Projection) was used for reducing the space of features into 128 features. The resulting lower-dimensional embeddings were subsequently employed in training and testing a collection of Machine learning classifiers: K-Nearest Neighbors (KNN), Random Forest, Logistic Regression (LR), Support Vector Machines (SVM), and XGBoost. The test accuracy of each model per cancer category indicated the following: In the classification of Cervical Cancer, KNN demonstrated the best performance with an accuracy of 95.84%, closely followed by SVM at 93.44%, XGBoost at 92.30%, Random Forest at 90.98%, and Logistic Regression at 90.70%. In the classification of ALL, XGBoost demonstrated the best performance with an accuracy of 93.73%, closely followed by Random Forest at 93.47%, KNN at 93.20%, SVM at 91.37%, and Logistic Regression at 91.17%.