Deep Learning Based Semantic Segmentation and Explianability Analysis for Building Footprint Extraction Using High Resolution Remote Sensing Imagery

Kavzoglu, T., Yilmaz, E.O., Teke, A.

Deep Learning Based Semantic Segmentation and Explianability Analysis for Building Footprint Extraction Using High Resolution Remote Sensing Imagery
Kavzoglu, T., Yilmaz, E.O., Teke, A.

Gebze Technical University, Turkey

Abstract

Building extraction from remote sensing imagery is pivotal across multiple research domains, including urban planning, land management, transportation planning, and disaster monitoring. Although deep learning has demonstrated premise in building extraction, significant challenges remain, due to the variations in building types and the presence of complex backgrounds. Progress in building footprint extraction directly supports two Sustainable Development Goals: Sustainable Cities and Communities (SDG 11), which promotes planned and sustainable urban development, and Climate Action (SDG 13), which is critical for identifying buildings located within disaster prone areas. In this study, the extraction of building footprints was conducted using two widely used deep learning based semantic segmentation models, namely DeepLabV3plus and PSPNet. A high-resolution SPOT dataset covering a study site in the Pyrenees-Orientales region of France was constructed and utilized. The performances of the pretrained models were compared using IoU, F-score, accuracy, precision, and recall metrics. Additionally, decision making processes of the models were analyzed for explainability using the GradientSHAP technique. Results showed that the DeepLabV3plus model achieved an IoU score of 0.9541 and an accuracy rate of 0.9762, whereas the PSPNet model achieved an IoU score of 0.9463 and an accuracy rate of 0.9720. As a result, it was found that the DeepLabV3plus architecture was more effective for large and regular building types. Moreover, GradientSHAP maps produced for the DeepLabV3plus model showed greater sensitivity to building boundaries, and decisions were focused more specifically on the building structure. On the other hand, the PSPNet model focuses more scattered and widespread. In summary, the DeepLabV3plus model resulted in more reliable outputs in terms of both quantitative performance and the explainability perspective.

Keywords: Building segmentation, SPOT satellite image, DeepLabV3plus, PSPNet, deep learning

Topic: Topic B: Applications of Remote Sensing

ACRS 2025 Conference | Conference Management System