P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications

Gourav Datta; Souvik Kundu; Zihan Yin; Ravi Teja Lakkireddy; Joe Mathai; Ajey Jacob; Peter A. Beerel; Akhilesh R. Jaiswal

P2M：リソースに制約のあるTinyMLアプリケーションのメモリ内ピクセル処理パラダイム

最先端の高解像度カメラから生成された膨大な量のデータを処理する需要は、新しいエネルギー効率の高いオンデバイスAIソリューションを動機付けています。このようなカメラの視覚データは通常、センサーピクセルアレイによってアナログ電圧の形式でキャプチャされ、アナログ-デジタルコンバーター（ADC）を使用して後続のAI処理のためにデジタルドメインに変換されます。最近の研究では、センサーに近い処理とセンサー内処理の形で超並列低電力アナログ/デジタルコンピューティングを利用しようとしています。この場合、AI計算は、一部はピクセルアレイの周辺で、一部は別の場所で実行されます。 -ボードCPU/アクセラレータ。残念ながら、高解像度の入力画像は、カメラとAI処理ユニットの間でフレームごとにストリーミングする必要があり、エネルギー、帯域幅、セキュリティのボトルネックが発生します。この問題を軽減するために、アナログマルチチャネル、マルチビット畳み込み、バッチ正規化、およびReLU（Rectified Linear単位）。私たちのソリューションには、全体的なアルゴリズムと回路の共同設計アプローチが含まれており、結果として得られるP2Mパラダイムは、ファウンドリで製造可能なCMOSイメージセンサープラットフォーム内に、メモリを大量に消費する畳み込みニューラルネットワーク（CNN）モデルの最初の数層を埋め込むためのドロップイン置換として使用できます。。私たちの実験結果は、P2Mがセンサーおよびアナログからデジタルへの変換からのデータ転送帯域幅を約21倍削減し、ビジュアルウェイクワードデータセット（VWW）のTinyMLユースケースでMobileNetV2モデルを処理する際に発生するエネルギー遅延積（EDP）をテストの精度を大幅に低下させることなく、標準のニアプロセッシングまたはセンサー内の実装と比較して最大11倍。

The demand to process vast amounts of data generated from state-of-the-art high resolution cameras has motivated novel energy-efficient on-device AI solutions. Visual data in such cameras are usually captured in the form of analog voltages by a sensor pixel array, and then converted to the digital domain for subsequent AI processing using analog-to-digital converters (ADC). Recent research has tried to take advantage of massively parallel low-power analog/digital computing in the form of near- and in-sensor processing, in which the AI computation is performed partly in the periphery of the pixel array and partly in a separate on-board CPU/accelerator. Unfortunately, high-resolution input images still need to be streamed between the camera and the AI processing unit, frame by frame, causing energy, bandwidth, and security bottlenecks. To mitigate this problem, we propose a novel Processing-in-Pixel-in-memory (P2M) paradigm, that customizes the pixel array by adding support for analog multi-channel, multi-bit convolution, batch normalization, and ReLU (Rectified Linear Units). Our solution includes a holistic algorithm-circuit co-design approach and the resulting P2M paradigm can be used as a drop-in replacement for embedding memory-intensive first few layers of convolutional neural network (CNN) models within foundry-manufacturable CMOS image sensor platforms. Our experimental results indicate that P2M reduces data transfer bandwidth from sensors and analog to digital conversions by ~21x, and the energy-delay product (EDP) incurred in processing a MobileNetV2 model on a TinyML use case for visual wake words dataset (VWW) by up to ~11x compared to standard near-processing or in-sensor implementations, without any significant drop in test accuracy.

updated: Thu Mar 17 2022 01:55:36 GMT+0000 (UTC)

published: Mon Mar 07 2022 04:15:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト