A Dynamic Graph CNN with Cross-Representation Distillation for Event-Based Recognition

Yongjian Deng; Hao Chen; Bochen Xie; Hai Liu; Youfu Li

イベントベースの認識のための相互表現蒸留を使用した動的グラフ CNN

イベントベースの研究における最近の進歩では、スパース性と時間精度が優先されています。十分に事前トレーニングされた CNN によって処理される密なフレームベースの表現を使用するアプローチは、グラフ CNN (GCN) によって学習された疎なポイントベースの表現の使用に置き換えられています。しかし、これらのグラフ手法の有効性は、フレームベースの方法よりもはるかに劣っており、2 つの制限があります。 (i) 各頂点のバリアント属性 (つまり、セマンティクス、空間的および時間的手がかり) を慎重に統合せずに偏ったグラフ構築を行うと、不正確なグラフ表現につながります。 (ii) 事前に十分に訓練されたモデルが利用できないため、学習が不十分です。ここでは、頂点のすべての属性を適応的に統合する動的集約モジュールを使用して、新しいイベントベースの GCN (EDGCN) を提案することにより、最初の問題を解決します。 2 番目の問題に対処するために、クロス表現蒸留 (CRD) と呼ばれる新しい学習フレームワークを導入します。これは、クロス表現補助としてイベントの密な表現を活用して、イベントグラフの追加の監視と事前知識を提供します。このフレームからグラフへの蒸留により、グラフベースのモデルの利点を維持しながら、CNN によって提供される大規模な事前確率の恩恵を受けることができます。広範な実験により、私たちのモデルと学習フレームワークが効果的であり、複数の視覚タスクにわたってうまく一般化できることが示されています。

Recent advances in event-based research prioritize sparsity and temporal precision. Approaches using dense frame-based representations processed via well-pretrained CNNs are being replaced by the use of sparse point-based representations learned through graph CNNs (GCN). Yet, the efficacy of these graph methods is far behind their frame-based counterparts with two limitations. (i) Biased graph construction without carefully integrating variant attributes (i.e., semantics, spatial and temporal cues) for each vertex, leading to imprecise graph representation. (ii) Deficient learning because of the lack of well-pretrained models available. Here we solve the first problem by proposing a new event-based GCN (EDGCN), with a dynamic aggregation module to integrate all attributes of vertices adaptively. To address the second problem, we introduce a novel learning framework called cross-representation distillation (CRD), which leverages the dense representation of events as a cross-representation auxiliary to provide additional supervision and prior knowledge for the event graph. This frame-to-graph distillation allows us to benefit from the large-scale priors provided by CNNs while still retaining the advantages of graph-based models. Extensive experiments show our model and learning framework are effective and generalize well across multiple vision tasks.

updated: Sun Apr 16 2023 08:14:29 GMT+0000 (UTC)

published: Wed Feb 08 2023 16:35:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト