Deep Video Deblurring: The Devil is in the Details

Jochen Gast; Stefan Roth

ディープビデオのブレ除去：悪魔は細部に宿る

ハンドヘルドカメラのビデオのブレ除去は、カメラの揺れとオブジェクトの動きの両方によって根本的なブラーが発生するため、難しい作業です。最先端のディープネットワークは、時空間変換器またはリカレントアーキテクチャのいずれかによって、隣接するフレームからの時間情報を活用します。これらの複雑なモデルとは対照的に、単純なベースラインCNNは、特別な注意を払うと驚くほどうまく機能することがわかりました。モデルとトレーニング手順の詳細。そのため、これらの重要な詳細に関する包括的な調査を実施し、定量的および定性的なパフォーマンスの極端な違いを明らかにします。これらの詳細を活用することで、単純なベースラインCNNのアーキテクチャとトレーニング手順を3.15dBだけ驚異的に向上させることができます。最先端のネットワーク。これにより、モデル間の報告された精度の違いは常に技術的な貢献によるものなのか、それとも直交するが重要な詳細の影響を受けるのかという疑問が生じます。

Video deblurring for hand-held cameras is a challenging task, since the underlying blur is caused by both camera shake and object motion. State-of-the-art deep networks exploit temporal information from neighboring frames, either by means of spatio-temporal transformers or by recurrent architectures. In contrast to these involved models, we found that a simple baseline CNN can perform astonishingly well when particular care is taken w.r.t. the details of model and training procedure. To that end, we conduct a comprehensive study regarding these crucial details, uncovering extreme differences in quantitative and qualitative performance. Exploiting these details allows us to boost the architecture and training procedure of a simple baseline CNN by a staggering 3.15dB, such that it becomes highly competitive w.r.t. cutting-edge networks. This raises the question whether the reported accuracy difference between models is always due to technical contributions or also subject to such orthogonal, but crucial details.

updated: Thu Sep 26 2019 15:35:29 GMT+0000 (UTC)

published: Thu Sep 26 2019 15:35:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト