Depth Refinement in 3D Mapping of Construction Sites Using a Stereo Camera
Ryunosuke Ishiguro (a*), Junichi Susaki (a), Yoshie Ishii (a)

Department of Civil and Earth Resources Engineering, Graduate School of Engineering, Kyoto University
*ishiguro.ryunosuke.62w[at]st.kyoto-u.ac.jp


Abstract

Construction industry in Japan faces a critical challenge due to a severe labor shortage and a decline in skilled crane operators, creating a pressing need for automated 3D environmental mapping to support crane operations. Conventional stereo matching methods, such as Semi-Global Block Matching (SGBM), are prone to failure in texture-poor regions and occlusions, which are common at construction sites. Meanwhile, deep learning models, typified by PSMNet, require large-scale labeled training data that is difficult to acquire. To overcome these challenges, this research proposes a practical and robust method for generating high-fidelity 3D maps that circumvents the need for large-scale training data and compensates for the weaknesses of SGBM. The core of our proposed 3D reconstruction pipeline is a depth refinement process based on the PatchMatch MVS framework, which incorporates three key enhancements. First, it is initialized with a dense depth map and normal vectors derived directly from a high-quality stereo disparity map to establish a robust initial state. Second, instead of conventional alternating propagation, it employs a priority-based propagation scheme that expands outwards from pixels with the lowest initial depth error. This approach suppresses error propagation and enables a systematic refinement process. Third, during the random search, the search range is adaptively adjusted for each pixel according to its estimated depth error, efficiently focusing computational resources where refinement is most needed. To validate the effectiveness of our method, we utilized a synthetic dataset generated from a detailed 3D model that simulates a crane work site. A quantitative evaluation against the ground-truth data confirmed that the proposed method generates highly accurate depth maps, significantly reducing the Mean Absolute Error (MAE) from 0.41 to 0.17 compared to standalone SGBM. Our method produces a high-fidelity, dense point cloud while avoiding the heavy data requirements and potential generalization issues associated with machine learning models. This provides an essential foundation for the future development of sophisticated automated crane control systems.

Keywords: Depth Map, PatchMatch MVS, Stereo Camera, Construction Stites, Stereo Matching

Topic: Topic A: General Remote Sensing

ACRS 2025 Conference | Conference Management System