Contextually Guided Convolutional Neural Networks for Learning Most Transferable Representations

Olcay Kursun; Semih Dinc; Oleg V. Favorov

最も転送可能な表現を学習するためのコンテキストガイド付き畳み込みニューラルネットワーク

非常に大きなラベル付きデータセットで広範囲にトレーニングされたDeepConvolutional Neural Networks（CNN）は、入力パターンの推論的に強力な機能を認識し、目的のコンテンツを効率的に表すことを学習します。内部表現のこのような客観性により、深いCNNはこれらの表現を簡単に転送し、新しい分類タスクに正常に適用できます。ディープCNNは、エラーバックプロパゲーションベースの教師ありトレーニングの挑戦的なプロセスを通じて内部表現を開発します。対照的に、大脳皮質のディープニューラルネットワークは、教師なしプロセスでさらに強力な内部表現を開発し、コンテキスト情報によってローカルレベルで誘導されるようです。単一層CNNアーキテクチャでこのようなローカルコンテキストガイダンスの原則を実装し、限られたサイズのデータセットでトレーニングされた浅いCNNで幅広い目的の表現（つまり、追加のトレーニングなしで新しいタスクに転送可能な表現）を開発するための効率的なアルゴリズムを提案します。コンテキストガイド付きCNN（CG-CNN）は、データセット内のランダムな画像位置で選択された隣接する画像パッチのグループでトレーニングされます。このような隣接するパッチは、共通のコンテキストを持っている可能性が高いため、トレーニングの目的で同じクラスに属するものとして扱われます。画像パッチのさまざまなコンテキスト共有グループでのこのようなトレーニングの複数の反復にわたって、1つの反復で最適化されたCNN機能は、さらに最適化するために次の反復に転送されます。このプロセスでは、CNN機能はより高い多能性または推論ユーティリティを取得します。転送ユーティリティとして定量化する任意の分類タスク。自然画像へのアプリケーションでは、CG-CNN機能は、よく知られているディープネットワークの最初のCNN層で、同等の転送可能な機能と同じ、またはそれ以上ではないにしても、転送ユーティリティと分類精度を示すことがわかります。

Deep Convolutional Neural Networks (CNNs), trained extensively on very large labeled datasets, learn to recognize inferentially powerful features in their input patterns and represent efficiently their objective content. Such objectivity of their internal representations enables deep CNNs to readily transfer and successfully apply these representations to new classification tasks. Deep CNNs develop their internal representations through a challenging process of error backpropagation-based supervised training. In contrast, deep neural networks of the cerebral cortex develop their even more powerful internal representations in an unsupervised process, apparently guided at a local level by contextual information. Implementing such local contextual guidance principles in a single-layer CNN architecture, we propose an efficient algorithm for developing broad-purpose representations (i.e., representations transferable to new tasks without additional training) in shallow CNNs trained on limited-size datasets. A contextually guided CNN (CG-CNN) is trained on groups of neighboring image patches picked at random image locations in the dataset. Such neighboring patches are likely to have a common context and therefore are treated for the purposes of training as belonging to the same class. Across multiple iterations of such training on different context-sharing groups of image patches, CNN features that are optimized in one iteration are then transferred to the next iteration for further optimization, etc. In this process, CNN features acquire higher pluripotency, or inferential utility for any arbitrary classification task, which we quantify as a transfer utility. In our application to natural images, we find that CG-CNN features show the same, if not higher, transfer utility and classification accuracy as comparable transferable features in the first CNN layer of the well-known deep networks.

updated: Wed Mar 24 2021 21:19:00 GMT+0000 (UTC)

published: Tue Mar 02 2021 08:41:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト