Morzsák

Oldal címe

3D Change Detection and Human Pose Estimation In Lidar Perception

Címlapos tartalom

In this thesis, I propose solutions for two research problems in the 3D perception of terrestrial mobile laser scanners.

In the first part of the dissertation, a novel deep neural network-based change detection approach is introduced, which can robustly extract changes between sparse and weakly registered point clouds obtained in a complex street-level environment, tolerating up to 1 m translation and 10° rotation misalignment between the corresponding 3D point cloud frames. In the proposed ChangeGAN model, the input point clouds are represented by range images, enabling the use of 2D convolutional neural networks. The result is a pair of binary masks indicating the change regions on each input range image, which can be back-projected to the input point clouds without loss of information. The proposed method utilizes a generative adversarial network-like (GAN) architecture, combining Siamese-style feature extraction, U-net-like multiscale feature usage, and Spatial Transformer Network blocks for optimal transformation estimation. I have evaluated the proposed method on various challenging scenarios, including a new dataset I created, demonstrating its superiority over state-of-the-art change detection methods.

The second part of the thesis focuses on 3D human pose estimation in Lidar point clouds. While Lidar sensors are generally expensive, I demonstrated that with a new and affordable Lidar sensor (Livox Avia) featuring a unique Non-Repetitive Circular Scanning (NRCS) pattern, human pose estimation tasks can be solved efficiently despite the sparseness of the point cloud measurements.

My proposed solution needs to implement two challenging steps. The first one concerns foreground-background segmentation of the recorded 3D Lidar point cloud frames. For this reason, I proposed a novel point-level foregroundbackground separation technique for measurement sequences of an NRCS Lidar sensor mounted in a fixed surveillance position. Here, the main challenge has been efficiently balancing the spatial and temporal resolution of the recorded range data. To address this, a very high-resolution background model of the sensor’s Field of View is automatically generated and maintained. For realtime analysis of dynamic objects, a low integration time is used. Consequently, laser reflections from foreground objects provide sparse but geometrically aciv curate samples of moving objects. These samples are valuable for higher-level shape description, object detection, and pose estimation. I demonstrate the efficiency of this new approach using various realistic NRCS Lidar measurement sequences from my new dataset.

The second step of my proposed Lidar surveillance approach addresses 3D human pose estimation based on purely the NRCS Lidar measurements. I proposed here a novel, vision transformer-based pose estimation method called LidPose for real-time 3D human skeleton detection in NRCS Lidar point clouds exploiting my previously introduced foreground segmentation approach. To train and evaluate the LidPose method, I created a novel, real-world, multimodal dataset containing camera images and Lidar point clouds from a Livox Avia sensor, with annotated 2D and 3D human skeleton ground truth. Using this dataset, I demonstrated that the proposed method can efficiently and accurately estimate 3D human poses using only NRCS Lidar point clouds.