Abstract:
This thesis undertakes a comprehensive exploration of object detection technologies with a particular emphasis on their application in autonomous driving and related application settings. The burgeoning field of autonomous driving hinges critically on the ability of sys- tems to perceive and understand the environment around them. A central component of this perceptual framework is 3D object detection, which facilitates accurate and reliable identification and localization of various objects within a vehicle’s vicinity. This thesis, spanning two semesters, aims to explore and enhance 3D object detection methodologies using state-of-the-art neural network architectures tailored for point cloud data derived from LiDAR sensors. During the first semester, our research has predominantly centered around the imple- mentation and analysis of VoxelNeXt, a novel neural network architecture known for its efficiency in processing 3D point clouds. VoxelNeXt leverages sparse convolutional net- works to address the computational and memory inefficiencies typical in traditional dense voxel-based approaches. The architecture’s design reduces the computational overhead by focusing processing power on non-empty voxels, thus not only preserving but enhancing the ability to detect smaller and more distant objects effectively. The thesis also extends its focus to error analysis by adapting the TIDE (Toolbox for Identifying Detection Errors) framework to 3D object detection tasks, termed as 3D TIDE. This extension provides a detailed categorization of detection errors, including localization, classification, and both errors, along with duplicate detections, false positives, and missed ground truths. The adaptation of TIDE to 3D introduces new matching criteria and error thresholds suitable for three-dimensional data, offering valuable insights into the failure modes of 3D detection models. For example, an analysis of VoxelNeXt’s performance on the NuScenes dataset revealed nuanced distinctions between localization and classification errors, guiding model optimization strategies. The initial phase of this research involved setting up the VoxelNeXt framework and integrating it with the NuScenes dataset. Preliminary results demonstrated promising improvements in both speed and accuracy compared to conventional methods. The 3D TIDE analysis further highlighted areas requiring enhancement, enabling targeted archi- tectural and training refinements. The next phases will extend the exploration to evaluate model robustness under varied environmental conditions, thereby assessing real-world ap- plicability and refining the 3D object detection pipeline. The ultimate goal of this thesis is to contribute a well-rounded analysis and potential enhancements to the field of autonomous driving technologies, bridging the gap between theoretical innovation and practical deployment.