A Dynamic Graph CNN with Cross-Representation Distillation for Event-Based Recognition

Yongjian Deng; Hao Chen; Bochen Xie; Hai Liu; Youfu Li

イベントベースの認識のための相互表現蒸留を使用した動的グラフ CNN

イベントを密なフレームベースの表現に変換して、事前に十分にトレーニングされた CNN を手元で使用することは、一般的なソリューションです。この一連の作業は魅力的なパフォーマンスを備えていますが、イベントのスパース性/時間精度を犠牲にし、通常は重いモデルを必要とするため、イベントカメラの利点と実際のアプリケーションの可能性が大幅に弱められます。よりアプリケーションに適した方法は、イベントからスパースポイントベースの表現を学習するためのディープグラフモデルを設計することです。しかし、これらのグラフモデルの有効性は、2 つの重要な制限があるフレームベースの対応物よりもはるかに遅れています。偏ったグラフ表現; (ii) 利用可能な適切な事前トレーニングモデルがないため、学習が不十分です。ここでは、頂点のすべての属性を適応的に統合する動的集計モジュールを使用して、新しいイベントベースのグラフ CNN (EDGCN) を導入することにより、最初の問題を解決します。学習の難しさを軽減するために、イベントの密な表現の対応物を相互表現の補助として活用して、イベントグラフの追加の監視と事前知識を提供することを提案します。この目的のために、カスタマイズされたハイブリッド蒸留損失を使用してフレームからグラフへの転送学習フレームワークを形成し、レイヤー間のさまざまな相互表現のギャップを十分に尊重します。複数のビジョンタスクに関する広範な実験により、提案されたモデルと蒸留戦略の有効性と高い一般化能力が検証されます（コードのコアコンポーネントは補足資料とともに提出され、承認され次第公開されます）

It is a popular solution to convert events into dense frame-based representations to use the well-pretrained CNNs in hand. Although with appealing performance, this line of work sacrifices the sparsity/temporal precision of events and usually necessitates heavy-weight models, thereby largely weakening the advantages and real-life application potential of event cameras. A more application-friendly way is to design deep graph models for learning sparse point-based representations from events. Yet, the efficacy of these graph models is far behind the frame-based counterpart with two key limitations: (i) simple graph construction strategies without carefully integrating the variant attributes (i.e., semantics, spatial and temporal coordinates) for each vertex, leading to biased graph representation; (ii) deficient learning because the lack of well pretraining models available. Here we solve the first problem by introducing a new event-based graph CNN (EDGCN), with a dynamic aggregation module to integrate all attributes of vertices adaptively. To alleviate the learning difficulty, we propose to leverage the dense representation counterpart of events as a cross-representation auxiliary to supply additional supervision and prior knowledge for the event graph. To this end, we form a frame-to-graph transfer learning framework with a customized hybrid distillation loss to well respect the varying cross-representation gaps across layers. Extensive experiments on multiple vision tasks validate the effectiveness and high generalization ability of our proposed model and distillation strategy (Core components of our codes are submitted with supplementary material and will be made publicly available upon acceptance)

updated: Wed Feb 08 2023 16:35:39 GMT+0000 (UTC)

published: Wed Feb 08 2023 16:35:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト