From Foundation to Field: LISA Fine-Tuning for Mine Open-Vocabulary Segmentation

JiBo Wang; Libin Jiao; Zhen Bao; Wenchao Gao; Lianzhi Huo

Proceedings of the 13th International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2025, 18-21 December 2025, Chengdu, China

Research Article

From Foundation to Field: LISA Fine-Tuning for Mine Open-Vocabulary Segmentation

Download16 downloads

Cite: BibTeX Plain Text

@INPROCEEDINGS{10.4108/eai.18-12-2025.2365260,
    author={JiBo  Wang and Libin  Jiao and Zhen  Bao and Wenchao  Gao and Lianzhi  Huo},
    title={From Foundation to Field: LISA Fine-Tuning for Mine Open-Vocabulary Segmentation},
    proceedings={Proceedings of the 13th International Conference on Identification, Information and Knowledge in the Internet of Things, IIKI 2025, 18-21 December 2025, Chengdu, China},
    publisher={EAI},
    proceedings_a={IIKI},
    year={2026},
    month={6},
    keywords={Multimodal large models open-vocabulary semantic segmentation underground coal-mine applications LoRA fine-tuning},
    doi={10.4108/eai.18-12-2025.2365260}
}

JiBo Wang
Libin Jiao
Zhen Bao
Wenchao Gao
Lianzhi Huo
Year: 2026
From Foundation to Field: LISA Fine-Tuning for Mine Open-Vocabulary Segmentation
IIKI
EAI
DOI: 10.4108/eai.18-12-2025.2365260

JiBo Wang¹, Libin Jiao¹, Zhen Bao², Wenchao Gao¹, Lianzhi Huo³^,*

1: School of Artificial Intelligence, China University of Mining and Technology-Beijing, Beijing, China
2: CHN Energy Science and Technology and Environment Co., Ltd., China; CHN Energy Zhi Shen Control Technology Co., Ltd., China
3: the Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing, China

*Contact email: huolz@aircas.ac.cn

Abstract

Underground mining environments are characterized by dim illumination, cluttered man-made structures, and frequent occlusions. However, most existing underground segmentation methods still follow closed-set pixel classification over fixed labels, making them unable to use natural-language instructions or dynamically segment context-specific targets. In this work, we present MineLISA, an instruction-guided segmentation framework adapted from the Language Instructed Segmentation Assistant (LISA) for industrial underground mining applications. MineLISA takes natural-language prompts as input and generates pixel-level masks for underground mining objects on the MUSeg multimodal semantic-segmentation dataset. To adapt LISA under realistic resource constraints, we employ LoRA-based parameter-efficient fine-tuning on vision-language alignment modules and the lightweight segmentation decoder, and re-weight the segmentation loss to emphasize thin, safety-critical structures such as cables and pipelines. This design improves the alignment between textual instructions and underground visual patterns while remaining suitable for hardware with limited GPU memory. Experiments on MUSeg show that, compared with the original LISA, MineLISA achieves substantially improved instruction-conditioned mask predictions and more stable segmentation across diverse underground object categories, indicating its potential for real-world coal-mine deployment.

Keywords: Multimodal large models, open-vocabulary semantic segmentation, underground coal-mine applications, LoRA fine-tuning

Published: 2026-06-17
Publisher: EAI

: http://dx.doi.org/10.4108/eai.18-12-2025.2365260

From Foundation to Field: LISA Fine-Tuning for Mine Open-Vocabulary Segmentation

Abstract

About EAI

Community

Publish with EAI