Unlocking Masked Autoencoders as Loss Function for Image and Video Restoration

Man Zhou; Naishan Zheng; Jie Huang; Chunle Guo; Chongyi Li

画像とビデオの復元のための損失関数としてのマスクされたオートエンコーダーのロック解除

ディープラーニングの出現により、画像とビデオの復元は目覚ましい飛躍を遂げました。ディープラーニングパラダイムの成功は、データ、モデル、損失という 3 つの重要な要素にあります。現在、多くの努力が最初の 2 つに向けられてきましたが、損失関数に焦点を当てた研究はほとんどありません。「L_1、L_2、および知覚損失などの事実上の最適化関数は最適ですか?」という質問により、損失の可能性を探り、「学習した損失関数が画像およびビデオ復元」。具体的には、マスクされたオートエンコーダー (MAE) の肩の上に立ち、それを「学習損失関数」として定式化します。これは、事前訓練された MAE が画像推論の事前確率を本質的に継承するためです。 1) タスクにカスタマイズされた MAE からネイティブ MAE へ、2) 画像タスクからビデオタスクへ、3) トランスフォーマー構造から畳み込みニューラルネットワーク構造への 3 つの観点から、私たちの信念の有効性を調査します。画像のノイズ除去、画像の超解像、画像強調、ガイド付き画像の超解像、ビデオのノイズ除去、ビデオの強化など、複数の画像およびビデオタスクにわたる広範な実験により、学習した損失関数によって導入された一貫したパフォーマンスの向上が実証されています。さらに、学習した損失関数は、推論段階で計算を行うことなく、トレーニング中に既存のネットワークに直接プラグインできるため、好ましいです。コードは公開されます。

Image and video restoration has achieved a remarkable leap with the advent of deep learning. The success of deep learning paradigm lies in three key components: data, model, and loss. Currently, many efforts have been devoted to the first two while seldom study focuses on loss function. With the question ``are the de facto optimization functions e.g., L_1, L_2, and perceptual losses optimal?'', we explore the potential of loss and raise our belief ``learned loss function empowers the learning capability of neural networks for image and video restoration''. Concretely, we stand on the shoulders of the masked Autoencoders (MAE) and formulate it as a `learned loss function', owing to the fact the pre-trained MAE innately inherits the prior of image reasoning. We investigate the efficacy of our belief from three perspectives: 1) from task-customized MAE to native MAE, 2) from image task to video task, and 3) from transformer structure to convolution neural network structure. Extensive experiments across multiple image and video tasks, including image denoising, image super-resolution, image enhancement, guided image super-resolution, video denoising, and video enhancement, demonstrate the consistent performance improvements introduced by the learned loss function. Besides, the learned loss function is preferable as it can be directly plugged into existing networks during training without involving computations in the inference stage. Code will be publicly available.

updated: Wed Mar 29 2023 02:41:08 GMT+0000 (UTC)

published: Wed Mar 29 2023 02:41:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト