Abstract:
This thesis investigates how the inherent geometric structure of road scenes can be effectively leveraged to solve two critical problems in visual traffic monitoring systems: sensor calibration and vehicle speed estimation. Instead of relying on manual calibration procedures or artificial targets, this work builds on the insight that road environments themselves provide rich geometric cues such as planar surfaces, consistent structures, and vanishing lines that can be used for robust computation across modalities. The first part addresses the challenge of LiDAR–camera cross-calibration. Here, planar surfaces commonly found in road scenes such as the ground, walls, and signboards are extracted from LiDAR point clouds and camera images. These geometries are matched using graph-based techniques, and a robust transformation is computed using model fitting to estimate the extrinsic parameters between the sensors. This enables automatic calibration in dynamic and unstructured environments, making it suitable for real-world traffic applications. The second part focuses on monocular vehicle speed estimation. A homography between the image plane and road surface is estimated by combining depth prediction, road segmentation, and intrinsic approximation using either vanishing point geometry or learning-based models. Once established, this homography maps image-space vehicle motion into real-world distances, allowing accurate speed computation from a single camera without requiring additional sensors or per-frame depth. By unifying both problems under the principle of exploiting road scene geometry, this thesis presents a cohesive and practical framework for vision-based traffic monitoring. The methods are validated across multiple challenging datasets and demonstrate robust performance even in the absence of ideal calibration conditions or sensor accuracy, enabling scalable deployment in real-world traffic systems.