A Robust Structure-from-Motion Framework under Appearance Variation
Jiwoo Kang (a), Kanghyeok Choi (b*)

a) Department of Geoinformatic Engineering, Inha University, Incheon 22212, Republic of Korea
b) Department of Geoinformatic Engineering, Inha University, Incheon 22212, Republic of Korea
*cwsurgy[at]inha.ac.kr


Abstract

Structure from Motion (SfM) is a technique that reconstructs the three-dimensional (3D) structure of a scene from images captured at different viewpoints. Over the past few decades, it has been widely applied in various domains, including digital twins and virtual environments. Despite ongoing advancements, challenges still remain such as high computational costs in large-scale scenes and reduced performance under significant appearance variation. A typical SfM pipeline involves feature matching between pairs of images, followed by the iterative optimization of camera parameters and 3D points via bundle adjustment. In large-scale scenes, the number of image pairs and parameters to be estimated increases rapidly, leading to computational overhead. Furthermore, significant visual discrepancies between images can degrade feature matching performance, ultimately reducing the quality of reconstruction. Therefore, we propose a database-driven framework that enables robust and efficient localization under such conditions. The proposed framework consists of an offline map construction phase and an online localization phase. In the offline phase, accurate camera poses and 3D scene geometry are reconstructed using SfM, from which a database is built. In the online phase, given a query image captured at a different time, its pose is estimated by referencing the pre-built database. This approach not only enables accurate localization under appearance variation, but also reduces the number of parameters involved in bundle adjustment, thereby improving computational efficiency. The proposed framework enables efficient and accurate 3D localization even in large-scale scenes and considerable appearance variation. It can be effectively applied to a wide range of real-world applications, including the localization of dynamic objects such as pedestrians and the management of large-scale spatial information.

Keywords: structure from motion- appearance variation- database-driven framework

Topic: Topic D: Geospatial Data Integration

ACRS 2025 Conference | Conference Management System