The Surprising Effectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

Saurabh Saxena; Charles Herrmann; Junhwa Hur; Abhishek Kar; Mohammad Norouzi; Deqing Sun; David J. Fleet

オプティカルフローと単眼の奥行き推定に対する拡散モデルの驚くべき効果

ノイズ除去拡散確率モデルは、その印象的な忠実性と多様性によって画像生成を変革しました。我々は、驚くべきことに、これらのタスクに支配的なタスク固有のアーキテクチャや損失関数を使用せずに、オプティカルフローと単眼深度の推定にも優れていることを示します。従来の回帰ベースの方法の点推定と比較して、拡散モデルはモンテカルロ推論も可能にし、たとえば流れと深さの不確実性と曖昧さを捉えることができます。自己教師あり事前トレーニング、教師ありトレーニング用の合成データと実際のデータの併用、およびノイズの多い不完全なトレーニングデータを処理するための技術革新 (インフィルとステップアンロールドノイズ除去拡散トレーニング)、および粗いデータから細かく調整することで、深さとオプティカルフローの推定のための最先端の拡散モデルをトレーニングできます。広範な実験では、ベンチマークに対する定量的なパフォーマンス、アブレーション、および不確実性とマルチモダリティを捕捉して欠損値を代入するモデルの能力に焦点を当てています。私たちのモデルである DDVM (Denoising Diffusion Vision Model) は、ニューヨーク大学の屋内ベンチマークで 0.074 の最先端の相対深度誤差を取得し、KITTI オプティカルフローベンチマークで 3.26% の Fl-all 外れ値率 (約 25%) を取得しました。公開されている最良の方法よりも優れています。概要については、https://diffusion-vision.github.io を参照してください。

Denoising diffusion probabilistic models have transformed image generation with their impressive fidelity and diversity. We show that they also excel in estimating optical flow and monocular depth, surprisingly, without task-specific architectures and loss functions that are predominant for these tasks. Compared to the point estimates of conventional regression-based methods, diffusion models also enable Monte Carlo inference, e.g., capturing uncertainty and ambiguity in flow and depth. With self-supervised pre-training, the combined use of synthetic and real data for supervised training, and technical innovations (infilling and step-unrolled denoising diffusion training) to handle noisy-incomplete training data, and a simple form of coarse-to-fine refinement, one can train state-of-the-art diffusion models for depth and optical flow estimation. Extensive experiments focus on quantitative performance against benchmarks, ablations, and the model's ability to capture uncertainty and multimodality, and impute missing values. Our model, DDVM (Denoising Diffusion Vision Model), obtains a state-of-the-art relative depth error of 0.074 on the indoor NYU benchmark and an Fl-all outlier rate of 3.26% on the KITTI optical flow benchmark, about 25% better than the best published method. For an overview see https://diffusion-vision.github.io.

updated: Wed Dec 06 2023 04:19:29 GMT+0000 (UTC)

published: Fri Jun 02 2023 21:26:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト