Latent Graph Attention for Enhanced Spatial Context

Ayush Singh; Yash Bhambhu; Himanshu Buckchash; Deepak K. Gupta; Dilip K. Prasad

空間コンテキストの強化に対する潜在グラフの注目

画像内のグローバルコンテキストは、画像から画像への変換問題において非常に貴重です。従来のアテンションベースおよびグラフベースのモデルは、グローバルコンテキストを大幅に捕捉しますが、計算コストが高くなります。さらに、既存のアプローチは、画像上の任意の 2 点間のペアの意味関係を学習することだけに限定されています。このペーパーでは、既存のアーキテクチャにグローバルコンテキストを組み込むための、特に小規模アーキテクチャに大規模アーキテクチャに近いパフォーマンスを提供するための、計算コストが低く (ノード数に線形)、安定したモジュール式フレームワークである Latent Graph Attendant (LGA) を紹介します。したがって、軽量アーキテクチャは、計算能力とエネルギー需要が低いエッジデバイスにとってより有用になります。 LGA は、ローカルに接続されたグラフのネットワークを使用して情報を空間的に伝播します。これにより、中間ピクセルの影響も考慮した、空間的に離れた 2 点間の意味的に一貫した関係の構築が容易になります。さらに、グラフネットワークの深さを使用して、コンテキストの広がりの範囲をターゲットデータセットに適応させることができるため、追加の計算コストを明示的に制御できます。 LGA の学習メカニズムを強化するために、追加の計算負荷を最小限に抑えながら、LGA モジュールが元のアーキテクチャとうまく結合できるようにする、新しい対照的な損失項も導入しました。 LGA を組み込むことで、透明オブジェクトのセグメンテーション、かすみ除去のための画像復元、オプティカルフロー推定という 3 つの困難なアプリケーションのパフォーマンスが向上することを示します。

Global contexts in images are quite valuable in image-to-image translation problems. Conventional attention-based and graph-based models capture the global context to a large extent, however, these are computationally expensive. Moreover, the existing approaches are limited to only learning the pairwise semantic relation between any two points on the image. In this paper, we present Latent Graph Attention (LGA) a computationally inexpensive (linear to the number of nodes) and stable, modular framework for incorporating the global context in the existing architectures, especially empowering small-scale architectures to give performance closer to large size architectures, thus making the light-weight architectures more useful for edge devices with lower compute power and lower energy needs. LGA propagates information spatially using a network of locally connected graphs, thereby facilitating to construct a semantically coherent relation between any two spatially distant points that also takes into account the influence of the intermediate pixels. Moreover, the depth of the graph network can be used to adapt the extent of contextual spread to the target dataset, thereby being able to explicitly control the added computational cost. To enhance the learning mechanism of LGA, we also introduce a novel contrastive loss term that helps our LGA module to couple well with the original architecture at the expense of minimal additional computational load. We show that incorporating LGA improves the performance on three challenging applications, namely transparent object segmentation, image restoration for dehazing and optical flow estimation.

updated: Sun Jul 09 2023 10:56:44 GMT+0000 (UTC)

published: Sun Jul 09 2023 10:56:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト