Deep 3D human pose estimation: A review


Three-dimensional (3D) human pose estimation involves estimating the articulated 3D joint locations of a human body from an image or video. Due to its widespread applications in a great variety of areas, such as human motion analysis, human–computer interaction, robots, 3D human pose estimation has recently attracted increasing attention in the computer vision community, however, it is a challenging task due to depth ambiguities and the lack of in-the-wild datasets. A large number of approaches, with many based on deep learning, have been developed over the past decade, largely advancing the performance on existing benchmarks. To guide future development, a comprehensive literature review is highly desired in this area. However, existing surveys on 3D human pose estimation mainly focus on traditional methods and a comprehensive review on deep learning based methods remains lacking in the literature. In this paper, we provide a thorough review of existing deep learning based works for 3D pose estimation, summarize the advantages and disadvantages of these methods and provide an in-depth understanding of this area. Furthermore, we also explore the commonly-used benchmark datasets on which we conduct a comprehensive study for comparison and analysis. Our study sheds light on the state of research development in 3D human pose estimation and provides insights that can facilitate the future design of models and algorithms.

CVIU 2021