On the role of feedback in visual processing: a predictive coding perspective

Andrea Alamia; Milad Mozafari; Bhavin Choksi; Rufin VanRullen

視覚処理におけるフィードバックの役割: 予測コーディングの観点

脳にインスパイアされた機械学習は、特にコンピュータービジョンでますます考慮されています。いくつかの研究では、畳み込みネットワークにトップダウンフィードバック接続を含めることが調査されました。ただし、これらの接続が機能的にいつどのように役立つかは不明のままです。ここでは、ノイズの多い条件下でのオブジェクト認識のコンテキストでこの問題に対処します。ディープ畳み込みネットワーク (CNN) をフィードフォワード視覚処理のモデルと見なし、クリーンな画像の再構成または分類のためにトレーニングされたフィードバック接続 (予測フィードバック) を介して予測コーディング (PC) ダイナミクスを実装します。さまざまな実験状況における予測フィードバックの計算上の役割を直接評価するために、ネットワークの再帰ダイナミクスを制御するハイパーパラメーターを最適化して解釈します。つまり、トップダウン接続と予測コーディングダイナミクスが機能的に有益であるかどうかを最適化プロセスに決定させます。さまざまなモデル深度とアーキテクチャ (3 層 CNN、ResNet18、および EfficientNetB0) にわたって、またさまざまなタイプのノイズ (CIFAR100-C) に対して、ノイズレベルが上がるにつれてネットワークがトップダウン予測にますます依存することがわかりました。より深いネットワークでは、この効果は下位層で最も顕著です。さらに、PC ダイナミクスを実装するネットワークの精度は、同等のフォワードネットワークと比較して、時間の経過とともに大幅に向上します。全体として、私たちの結果は、感覚システムにおけるフィードバック接続の計算上の役割を確認することで神経科学に関連する新しい洞察を提供し、これらが現在の視覚モデルの堅牢性をどのように改善できるかを明らかにすることで機械学習に関連しています。

Brain-inspired machine learning is gaining increasing consideration, particularly in computer vision. Several studies investigated the inclusion of top-down feedback connections in convolutional networks; however, it remains unclear how and when these connections are functionally helpful. Here we address this question in the context of object recognition under noisy conditions. We consider deep convolutional networks (CNNs) as models of feed-forward visual processing and implement Predictive Coding (PC) dynamics through feedback connections (predictive feedback) trained for reconstruction or classification of clean images. To directly assess the computational role of predictive feedback in various experimental situations, we optimize and interpret the hyper-parameters controlling the network's recurrent dynamics. That is, we let the optimization process determine whether top-down connections and predictive coding dynamics are functionally beneficial. Across different model depths and architectures (3-layer CNN, ResNet18, and EfficientNetB0) and against various types of noise (CIFAR100-C), we find that the network increasingly relies on top-down predictions as the noise level increases; in deeper networks, this effect is most prominent at lower layers. In addition, the accuracy of the network implementing PC dynamics significantly increases over time-steps, compared to its equivalent forward network. All in all, our results provide novel insights relevant to Neuroscience by confirming the computational role of feedback connections in sensory systems, and to Machine Learning by revealing how these can improve the robustness of current vision models.

updated: Tue Jun 08 2021 10:07:23 GMT+0000 (UTC)

published: Tue Jun 08 2021 10:07:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト