The aim of our thesis is to develop a 3D human pose estimation pipeline based on neural network, which takes three-dimensional data (in a form of a point cloud or a depth map) as input and outputs the 3D skeletal joint coordinates. We intend to overcome the limitations associated with high non-linearity of direct regression from 2D image to 3D pose by inferring from 3D input data, and thus increase the estimation accuracy. Our goal is to introduce our own approach, in addition to implementing several well-performing models proposed in existing papers. Next, we aim to evaluate the methods on multiple benchmark datasets, and compare the results to the current state-of-the-art.