
Research Article
Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss
@INPROCEEDINGS{10.1007/978-3-031-24386-8_13, author={Jin Chen and Huahu Xu and Minjie Bian and Jiangang Shi and Yuzhe Huang and Chen Cheng}, title={Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss}, proceedings={Collaborative Computing: Networking, Applications and Worksharing. 18th EAI International Conference, CollaborateCom 2022, Hangzhou, China, October 15-16, 2022, Proceedings, Part II}, proceedings_a={COLLABORATECOM PART 2}, year={2023}, month={1}, keywords={Head pose estimation 6D rotation Fine-grained image analysis Multiregression loss Landmark-free method}, doi={10.1007/978-3-031-24386-8_13} }
- Jin Chen
Huahu Xu
Minjie Bian
Jiangang Shi
Yuzhe Huang
Chen Cheng
Year: 2023
Fine-Grained Head Pose Estimation Based on a 6D Rotation Representation with Multiregression Loss
COLLABORATECOM PART 2
Springer
DOI: 10.1007/978-3-031-24386-8_13
Abstract
Estimating the head pose is vital in action evaluation since it has extensive applications such as in automobile driver-assistance systems, performance evaluations of athletes and customers’ attention in retail stores. It is difficult to predict the head orientation from an RGB image by deep learning more accurately. We propose 6DHPENet, a fine-grained 6D head pose estimation network, to estimate the 3D rotations of the head. First, the model adopts a 6D rotation representation for 3D rotations as training objective to guarantee effective learning. 6D rotation representation is a continuous and one-to-one mapping function for 3D rotations. Second, achieving 3D facial landmarks from real-time activities consumes more time and is subject to frontal views. We drop the 3D facial landmarks to enhance the adaptability and generalization ability in various application scenes. Third, after the last convolution extraction layer, a squeeze-and-excitation module is introduced to construct both the local spatial and global channel-wise facial feature information by explicitly modeling the interdependencies between the feature channels. Finally, a multiregression loss function is presented to improve the accuracy and stability for a full-range view of the head pose estimation. In addition, our method is compact and efficient for mobile devices because of the lightweight CNN backbone. The quantitative experiment results trained on 300W-LP datasets show the superior performance of our 6D rotation representation-based multiregression fine-grained method on the AFLW2000 and BIWI datasets.