
Research Article
CAD-guided 6D pose estimation with deep learning in digital twin for industrial collaborative robot manipulation
@ARTICLE{10.4108/airo.9676, author={Quang Huan Dong and The Thinh Pham and Khanh Nguyen and Chi-Cuong Tran and Hoang Huy Tran and Duy Tan Do and Khang Hoang Vinh Nguyen and Quang-Chien Ngyuyen}, title={CAD-guided 6D pose estimation with deep learning in digital twin for industrial collaborative robot manipulation}, journal={EAI Endorsed Transactions on AI and Robotics}, volume={4}, number={1}, publisher={EAI}, journal_a={AIRO}, year={2025}, month={10}, keywords={pose estimation, computer vision, digital twin, industrial robot}, doi={10.4108/airo.9676} }- Quang Huan Dong
The Thinh Pham
Khanh Nguyen
Chi-Cuong Tran
Hoang Huy Tran
Duy Tan Do
Khang Hoang Vinh Nguyen
Quang-Chien Ngyuyen
Year: 2025
CAD-guided 6D pose estimation with deep learning in digital twin for industrial collaborative robot manipulation
AIRO
EAI
DOI: 10.4108/airo.9676
Abstract
6D pose estimation in the bin-picking task has attracted increasing attention from researchers. CAD model-based method have been proposed, demonstrating its effectiveness. However, most existing research relies on point cloud registration from the RGB-D camera, which is often not robust to noise and low-light conditions, leading to degraded point cloud quality and reduced accuracy. Thereby, the method accuracy is significantly affected. Moreover, detecting objects correctly plays a vital role in multiple objects. Supervised deep learning takes consideration into this task, but it typically requires a large amount of labeled data. In industrial environments, sample collection and model retraining are limited. To address these challenges, we introduce the potential approach that integrates the zero-shot learning YOLOE and DEFOM-Stereo model. The YOLOE detects and localizes the object without requiring object-specific training, while DEFOM-Stereo generates point clouds for the CAD model-based pose estimation. Extensive experiments demonstrate that the proposed approach achieves high accuracy in pose estimation, which is essential for grasp planning and manipulation tasks in robotics. Furthermore, the proposed approach is applied in a Unity3D-based digital twin, enabling enhanced virtual representation of a physical pickup target with an estimated pose. Hence, the research result supports more accurate and responsive digital twins for robotics toward the development of smart manufacturing systems.
Copyright © 2025 Quang Huan Dong, et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.


