From Chaos Comes Order: Ordering Event Representations for Object Detection

Nikola Zubić; Daniel Gehrig; Mathias Gehrig; Davide Scaramuzza

From Chaos Comes Order: オブジェクト検出のためのイベント表現の順序付け

今日、イベントを処理する最先端のディープニューラルネットワークは、既製のネットワークを使用する前に、最初にそれらを高密度のグリッドのような入力表現に変換します。ただし、従来、タスクに適切な表現を選択するには、表現ごとにニューラルネットワークをトレーニングし、検証スコアに基づいて最適なものを選択する必要があり、非常に時間がかかります。この作業では、生のイベントとその表現の間の Gromov-Wasserstein Discrepancy (GWD) に基づいて最適な表現を選択することにより、このボトルネックを解消します。ニューラルネットワークをトレーニングするよりも約 200 倍高速に計算でき、複数の表現、ネットワークバックボーン、およびデータセットにわたってイベント表現のタスクパフォーマンスランキングを保持します。これは、タスクスコアが高い表現を見つけることは、GWD が低い表現を見つけることと同じであることを意味します。この洞察を使用して、初めて、大規模なイベント表現ファミリーに対してハイパーパラメーター検索を実行し、最先端を超える新しく強力な表現を明らかにします。オブジェクト検出では、最適化された表現は、1 Mpx データセットで 1.9% mAP、Gen1 データセットで 8.6% mAP だけ既存の表現を上回り、Gen1 と最新の-1 Mpx データセットで 6.0% mAP によるアートフィードフォワードメソッド。この作業は、イベントベースの学習方法の明示的な表現最適化の新しい未踏の分野を開きます。

Today, state-of-the-art deep neural networks that process events first convert them into dense, grid-like input representations before using an off-the-shelf network. However, selecting the appropriate representation for the task traditionally requires training a neural network for each representation and selecting the best one based on the validation score, which is very time-consuming. In this work, we eliminate this bottleneck by selecting the best representation based on the Gromov-Wasserstein Discrepancy (GWD) between the raw events and their representation. It is approximately 200 times faster to compute than training a neural network and preserves the task performance ranking of event representations across multiple representations, network backbones, and datasets. This means that finding a representation with a high task score is equivalent to finding a representation with a low GWD. We use this insight to, for the first time, perform a hyperparameter search on a large family of event representations, revealing new and powerful representations that exceed the state-of-the-art. On object detection, our optimized representation outperforms existing representations by 1.9% mAP on the 1 Mpx dataset and 8.6% mAP on the Gen1 dataset and even outperforms the state-of-the-art by 1.8% mAP on Gen1 and state-of-the-art feed-forward methods by 6.0% mAP on the 1 Mpx dataset. This work opens a new unexplored field of explicit representation optimization for event-based learning methods.

updated: Wed Apr 26 2023 11:27:34 GMT+0000 (UTC)

published: Wed Apr 26 2023 11:27:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト