Rethinking Generalization in American Sign Language Prediction for Edge Devices with Extremely Low Memory Footprint

Aditya Jyoti Paul; Puranjay Mohan; Stuti Sehgal

メモリーフットプリントが非常に少ないエッジデバイスのアメリカ手話予測における一般化の再考

過去数年間のテクニカルコンピューティングのブームにより、世界では、さまざまな現実世界の問題を解決する人工知能システムが大幅に進歩しています。しかし、これらのモデルが広く受け入れられる上での主な障害は、計算の複雑さとメモリフットプリントが非常に大きいことです。したがって、リソースが非常に少ない推論エンドポイントに展開するには、効率的なアーキテクチャとトレーニング手法が必要です。このホワイトペーパーでは、わずか496KBのフレームバッファRAMを搭載したARMCortex-M7マイクロコントローラでアメリカ手話のアルファベットを検出するためのアーキテクチャを提案します。パラメータの量子化を活用することは、テスト精度のさまざまな低下を引き起こす可能性がある一般的な手法です。このホワイトペーパーでは、この低下を減らす効率的な方法として、他の手法の中でも特に内挿を拡張として使用することを提案します。これは、モデルが以前は見られなかったノイズの多いデータにうまく一般化するのにも役立ちます。提案されたモデルは約185KBのポスト量子化であり、推論速度は20フレーム/秒です。

Due to the boom in technical compute in the last few years, the world has seen massive advances in artificially intelligent systems solving diverse real-world problems. But a major roadblock in the ubiquitous acceptance of these models is their enormous computational complexity and memory footprint. Hence efficient architectures and training techniques are required for deployment on extremely low resource inference endpoints. This paper proposes an architecture for detection of alphabets in American Sign Language on an ARM Cortex-M7 microcontroller having just 496 KB of framebuffer RAM. Leveraging parameter quantization is a common technique that might cause varying drops in test accuracy. This paper proposes using interpolation as augmentation amongst other techniques as an efficient method of reducing this drop, which also helps the model generalize well to previously unseen noisy data. The proposed model is about 185 KB post-quantization and inference speed is 20 frames per second.

updated: Sat Feb 13 2021 10:24:01 GMT+0000 (UTC)

published: Fri Nov 27 2020 14:05:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト