Towards Unsupervised Fine-Tuning for Edge Video Analytics

Daniel Rivas; Francesc Guim; Jordà Polo; Josep Ll. Berral; Pubudu M. Silva; David Carrera

エッジビデオ分析のための教師なし微調整に向けて

ImageNetやPASCALVOCなどの一般的で一般的なコンピュータービジョンの課題から判断すると、ニューラルネットワークは認識タスクで非常に正確であることが証明されています。ただし、最先端の精度には高い計算コストがかかることが多く、ほぼリアルタイムのパフォーマンスを実現するには、同様に最先端のハイエンドハードウェアアクセラレーションが必要です。同時に、スマートシティや自動運転車などのユースケースでは、固定カメラからの画像をリアルタイムで自動分析する必要があります。これらのストリームが生成するネットワーク帯域幅は膨大で一定量であるため、コンピューティングを遍在する全能のクラウドにオフロードすることに依存することはできません。したがって、画像をローカルで処理するには、分散型EdgeCloudが担当する必要があります。ただし、Edge Cloudは本質的にリソースに制約があるため、エッジで実行されるモデルの計算の複雑さに制限があります。それでも、EdgeCloudと正確なリアルタイムビデオ分析の間の出会いの場が必要です。本論文では、自動モデル特殊化により、余分な計算コストをかけずにエッジモデルの精度を向上させる方法を提案します。最初に、静的カメラの唯一の仮定により、問題の範囲を大幅に単純化する一連の考慮事項を作成できることを示します。次に、これらの考慮事項を実装および統合してモデルのエンドツーエンドの微調整を自動化するフレームワークであるEdgeAutoTunerを紹介します。最後に、複雑なニューラルネットワーク（より一般化できる）を教師として効果的に使用して、軽量ニューラルネットワークの微調整のためにデータセットに注釈を付け、特定のエッジコンテキストに合わせて調整できることを示します。これにより、一定の計算コストで精度が向上します。人間の介入なしにそうします。結果は、私たちの方法が事前に訓練されたモデルの精度を平均21％自動的に改善できることを示しています。

Judging by popular and generic computer vision challenges, such as the ImageNet or PASCAL VOC, neural networks have proven to be exceptionally accurate in recognition tasks. However, state-of-the-art accuracy often comes at a high computational price, requiring equally state-of-the-art and high-end hardware acceleration to achieve anything near real-time performance. At the same time, use cases such as smart cities or autonomous vehicles require an automated analysis of images from fixed cameras in real-time. Due to the huge and constant amount of network bandwidth these streams would generate, we cannot rely on offloading compute to the omnipresent and omnipotent cloud. Therefore, a distributed Edge Cloud must be in charge to process images locally. However, the Edge Cloud is, by nature, resource-constrained, which puts a limit on the computational complexity of the models executed in the edge. Nonetheless, there is a need for a meeting point between the Edge Cloud and accurate real-time video analytics. In this paper, we propose a method for improving accuracy of edge models without any extra compute cost by means of automatic model specialization. First, we show how the sole assumption of static cameras allows us to make a series of considerations that greatly simplify the scope of the problem. Then, we present Edge AutoTuner, a framework that implements and brings these considerations together to automate the end-to-end fine-tuning of models. Finally, we show that complex neural networks - able to generalize better - can be effectively used as teachers to annotate datasets for the fine-tuning of lightweight neural networks and tailor them to the specific edge context, which boosts accuracy at constant computational cost, and do so without any human interaction. Results show that our method can automatically improve accuracy of pre-trained models by an average of 21%.

updated: Wed Apr 14 2021 12:57:40 GMT+0000 (UTC)

published: Wed Apr 14 2021 12:57:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト