A Systematic Performance Analysis of Deep Perceptual Loss Networks Breaks Transfer Learning Conventions

Gustav Grund Pihlgren; Konstantina Nikolaidou; Prakash Chandra Chhipa; Nosheen Abid; Rajkumar Saini; Fredrik Sandin; Marcus Liwicki

ディープパーセプチュアルロスネットワークの体系的なパフォーマンス分析が転移学習の慣習を打ち破る

ディープパーセプチュアルロスは、ニューラルネットワークから抽出されたディープフィーチャを使用して人間の知覚を模倣することを目的とした、コンピュータービジョンの一種のロス関数です。近年、この方法は、多くの興味深いコンピュータービジョンタスク、特に画像または画像のような出力を伴うタスクに適用され、大きな効果を発揮しています。この方法の多くのアプリケーションでは、事前学習済みのネットワーク (多くの場合、畳み込みネットワーク) を使用して損失を計算します。関心が高まり、使用が広まっているにもかかわらず、深い知覚的損失を計算するためにどのネットワークを使用し、どの層から特徴を抽出するかを調査するために、より多くの努力が必要です。この作業は、深い知覚損失の 4 つの既存のユースケースで、多数の異なる特徴抽出ポイントについて、一般的に使用され、すぐに利用できる事前トレーニング済みのネットワークのホストを体系的に評価することによって、これを修正することを目的としています。 4 つのユースケースは、元の作業で使用されたネットワークと抽出ポイントの代わりに、選択されたネットワークと抽出ポイントが評価される以前の作業の実装です。実験タスクは、次元削減、画像セグメンテーション、超解像、および知覚的類似性です。これら 4 つのタスクのパフォーマンス、ネットワークの属性、および抽出ポイントは、詳細な分析の基礎として使用されます。この分析により、どのアーキテクチャが深い知覚的損失に対して優れたパフォーマンスを提供するか、および特定のタスクとデータセットに適切な抽出ポイントを選択する方法に関する重要な情報が明らかになります。さらに、この研究では、深い知覚喪失と転移学習のより広い分野に対する結果の意味について説明しています。結果は、転移学習で一般的に保持されている仮定を破ります。これは、深い知覚喪失がほとんどの転移学習設定から逸脱していること、またはこれらの仮定を徹底的に再評価する必要があることを意味します。

Deep perceptual loss is a type of loss function in computer vision that aims to mimic human perception by using the deep features extracted from neural networks. In recent years the method has been applied to great effect on a host of interesting computer vision tasks, especially for tasks with image or image-like outputs. Many applications of the method use pretrained networks, often convolutional networks, for loss calculation. Despite the increased interest and broader use, more effort is needed toward exploring which networks to use for calculating deep perceptual loss and from which layers to extract the features. This work aims to rectify this by systematically evaluating a host of commonly used and readily available, pretrained networks for a number of different feature extraction points on four existing use cases of deep perceptual loss. The four use cases are implementations of previous works where the selected networks and extraction points are evaluated instead of the networks and extraction points used in the original work. The experimental tasks are dimensionality reduction, image segmentation, super-resolution, and perceptual similarity. The performance on these four tasks, attributes of the networks, and extraction points are then used as a basis for an in-depth analysis. This analysis uncovers essential information regarding which architectures provide superior performance for deep perceptual loss and how to choose an appropriate extraction point for a particular task and dataset. Furthermore, the work discusses the implications of the results for deep perceptual loss and the broader field of transfer learning. The results break commonly held assumptions in transfer learning, which imply that deep perceptual loss deviates from most transfer learning settings or that these assumptions need a thorough re-evaluation.

updated: Wed Feb 08 2023 13:08:51 GMT+0000 (UTC)

published: Wed Feb 08 2023 13:08:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト