Disentangled Latent Transformer for Interpretable Monocular Height Estimation

Zhitong Xiong Sining Chen; Yilei Shi; Xiao Xiang Zhu

解釈可能な単眼高さ推定のための解きほぐされた潜在トランス

リモートセンシング画像からの単眼高さ推定（MHE）は、自然災害への迅速な対応のために3D都市モデルを効率的に生成する上で高い可能性を秘めています。ほとんどの既存の作品は、より高いパフォーマンスを追求しています。ただし、MHEネットワークの解釈可能性を調査する研究はほとんどありません。この論文では、深いニューラルネットワークが単一の単眼画像から高さを予測する方法を調査することを目標としています。 MHEネットワークの包括的な理解に向けて、複数のレベルからそれらを解釈することを提案します。1）ニューロン：ユニットレベルの解剖。学習した内部の深い表現の意味と高さの選択性を調査します。 2）インスタンス：オブジェクトレベルの解釈。高さの推定に対するさまざまなセマンティッククラス、スケール、および空間コンテキストの影響を調査します。 3）帰属：ピクセルレベルの分析。高さの推定に重要な入力ピクセルを理解する。マルチレベルの解釈に基づいて、解きほぐされた潜在トランスフォーマーネットワークが、単眼の高さ推定のためのよりコンパクトで信頼性が高く、説明可能な深層モデルに向けて提案されています。さらに、高さ推定に基づく新しい教師なしセグメンテーションタスクがこの作業で最初に導入されます。さらに、共同セマンティックセグメンテーションと高さ推定のための新しいデータセットも構築します。私たちの仕事は、MHEモデルの理解と設計の両方に新しい洞察を提供します。

Monocular height estimation (MHE) from remote sensing imagery has high potential in generating 3D city models efficiently for a quick response to natural disasters. Most existing works pursue higher performance. However, there is little research exploring the interpretability of MHE networks. In this paper, we target at exploring how deep neural networks predict height from a single monocular image. Towards a comprehensive understanding of MHE networks, we propose to interpret them from multiple levels: 1) Neurons: unit-level dissection. Exploring the semantic and height selectivity of the learned internal deep representations; 2) Instances: object-level interpretation. Studying the effects of different semantic classes, scales, and spatial contexts on height estimation; 3) Attribution: pixel-level analysis. Understanding which input pixels are important for the height estimation. Based on the multi-level interpretation, a disentangled latent Transformer network is proposed towards a more compact, reliable, and explainable deep model for monocular height estimation. Furthermore, a novel unsupervised semantic segmentation task based on height estimation is first introduced in this work. Additionally, we also construct a new dataset for joint semantic segmentation and height estimation. Our work provides novel insights for both understanding and designing MHE models.

updated: Mon Jan 17 2022 11:42:30 GMT+0000 (UTC)

published: Mon Jan 17 2022 11:42:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト