Learning from Event Cameras with Sparse Spiking Convolutional Neural Networks

Loïc Cordone; Benoît Miramond; Sonia Ferrante

スパーススパイク畳み込みニューラルネットワークを使用したイベントカメラからの学習

畳み込みニューラルネットワーク（CNN）は、その印象的な結果と学習のしやすさのおかげで、コンピュータービジョンの問題に対する事実上の解決策になりました。これらのネットワークは、人工ニューロンと呼ばれる接続されたユニットの層で構成されており、生物学的脳内のニューロンを大まかにモデル化しています。ただし、従来のハードウェア（CPU / GPU）に実装すると消費電力が大きくなり、組み込みシステムへの統合が困難になります。たとえば自動車の場合、組み込みアルゴリズムには、エネルギー、遅延、精度の点で非常に高い制約があります。より効率的なコンピュータービジョンアルゴリズムを設計するために、イベントカメラとスパイキングニューラルネットワーク（SNN）を使用して、生物学に触発されたエンドツーエンドのアプローチに従うことを提案します。イベントカメラは非同期でスパースなイベントを出力し、非常に効率的なデータソースを提供しますが、CNNなどの同期的で高密度のアルゴリズムでこれらのイベントを処理しても大きなメリットはありません。この制限に対処するために、スパイキングニューラルネットワーク（SNN）を使用します。これは、ユニットが個別のスパイクを使用して通信する、より生物学的に現実的なニューラルネットワークです。運用の性質上、ハードウェアにやさしく、エネルギー効率に優れていますが、トレーニングは依然として課題です。私たちの方法は、人気のある深層学習フレームワークPyTorchを使用して、イベントデータ上で直接スパーススパイク畳み込みニューラルネットワークのトレーニングを可能にします。人気のあるDVS128ジェスチャデータセットでの精度、スパース性、トレーニング時間の観点からのパフォーマンスにより、このバイオインスパイアードアプローチを使用して、低電力ニューロモルフィックハードウェアにリアルタイムアプリケーションを将来埋め込むことができます。

Convolutional neural networks (CNNs) are now the de facto solution for computer vision problems thanks to their impressive results and ease of learning. These networks are composed of layers of connected units called artificial neurons, loosely modeling the neurons in a biological brain. However, their implementation on conventional hardware (CPU/GPU) results in high power consumption, making their integration on embedded systems difficult. In a car for example, embedded algorithms have very high constraints in term of energy, latency and accuracy. To design more efficient computer vision algorithms, we propose to follow an end-to-end biologically inspired approach using event cameras and spiking neural networks (SNNs). Event cameras output asynchronous and sparse events, providing an incredibly efficient data source, but processing these events with synchronous and dense algorithms such as CNNs does not yield any significant benefits. To address this limitation, we use spiking neural networks (SNNs), which are more biologically realistic neural networks where units communicate using discrete spikes. Due to the nature of their operations, they are hardware friendly and energy-efficient, but training them still remains a challenge. Our method enables the training of sparse spiking convolutional neural networks directly on event data, using the popular deep learning framework PyTorch. The performances in terms of accuracy, sparsity and training time on the popular DVS128 Gesture Dataset make it possible to use this bio-inspired approach for the future embedding of real-time applications on low-power neuromorphic hardware.

updated: Mon Apr 26 2021 13:52:01 GMT+0000 (UTC)

published: Mon Apr 26 2021 13:52:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト