Semantic Attention Flow Fields for Dynamic Scene Decomposition

Yiqing Liang; Eliot Laidlaw; Alexander Meyerowitz; Srinath Sridhar; James Tompkin

動的シーン分解のためのセマンティックアテンションフローフィールド

SAFF を提示します: 時変色、密度、シーンフロー、セマンティクス、および注意情報で構成されるカジュアルな単眼ビデオの動的神経ボリューム再構成。セマンティクスと注意により、任意の時空間ビューで背景とは別に顕著な前景オブジェクトを識別できます。セマンティック情報とアテンション情報を表すために、2 つのネットワークヘッドを追加します。最適化のために、詳細を画像全体のコンテキストと交換する DINO-ViT 出力からセマンティックアテンションピラミッドを設計します。最適化後、顕著性を意識したクラスタリングを実行してシーンを分解します。時空を超えた現実世界の動的シーン分解を評価するために、NVIDIA 動的シーンデータセットのオブジェクトマスクに注釈を付けます。 SAFF が RGB や深度再構成の品質に影響を与えずに動的シーンを分解できること、ボリューム統合 SAFF が 2D ベースラインよりも優れていること、SAFF が最近の静的/動的分割方法より前景/背景セグメンテーションを改善することを実証します。プロジェクトの Web ページ: https://visual.cs.brown.edu/saff

We present SAFF: a dynamic neural volume reconstruction of a casual monocular video that consists of time-varying color, density, scene flow, semantics, and attention information. The semantics and attention let us identify salient foreground objects separately from the background in arbitrary spacetime views. We add two network heads to represent the semantic and attention information. For optimization, we design semantic attention pyramids from DINO-ViT outputs that trade detail with whole-image context. After optimization, we perform a saliency-aware clustering to decompose the scene. For evaluation on real-world dynamic scene decomposition across spacetime, we annotate object masks in the NVIDIA Dynamic Scene Dataset. We demonstrate that SAFF can decompose dynamic scenes without affecting RGB or depth reconstruction quality, that volume-integrated SAFF outperforms 2D baselines, and that SAFF improves foreground/background segmentation over recent static/dynamic split methods. Project Webpage: https://visual.cs.brown.edu/saff

updated: Thu Mar 02 2023 19:00:05 GMT+0000 (UTC)

published: Thu Mar 02 2023 19:00:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト