Towards Automatic Model Specialization for Edge Video Analytics

Daniel Rivas; Francesc Guim; Jordà Polo; Pubudu M. Silva; Josep Ll. Berral; David Carrera

エッジビデオ分析のための自動モデルスペシャライゼーションに向けて

ImageNetやPASCALVOCなどの一般的で一般的なコンピュータービジョンの課題から判断すると、ニューラルネットワークは認識タスクで非常に正確であることが証明されています。ただし、最先端の精度には高い計算コストがかかることが多く、リアルタイムのパフォーマンスを実現するにはハードウェアアクセラレーションが必要です。一方、スマートシティなどのユースケースでは、固定カメラからの画像をリアルタイムで分析する必要があります。これらのストリームが生成するネットワーク帯域幅の量が原因で、コンピューティングを集中型クラウドにオフロードすることに依存することはできません。したがって、分散エッジクラウドは画像をローカルで処理することが期待されます。ただし、エッジは本質的にリソースに制約があるため、実行できる計算の複雑さに制限があります。それでも、エッジと正確なリアルタイムビデオ分析の間の出会いの場が必要です。カメラごとに軽量モデルを専門化することは役立つかもしれませんが、プロセスが自動化されない限り、カメラの数が増えるにつれてすぐに実行不可能になります。このホワイトペーパーでは、エッジカメラでのビデオ分析のモデルの自動特殊化を支援するフレームワークであるCOVA（Contextually Optimized Video Analytics）を紹介し、評価します。 COVAは、特殊化により軽量モデルの精度を自動的に向上させます。さらに、プロセスに含まれる各ステップについて説明およびレビューし、それぞれに伴うさまざまなトレードオフを理解します。さらに、静的カメラの唯一の仮定により、問題の範囲を大幅に簡素化する一連の考慮事項を作成できることを示します。最後に、実験では、最先端のモデル、つまり目に見えない環境に一般化できるモデルを教師として効果的に使用して、特定のコンテキストに合わせて小規模なネットワークを調整し、一定の計算コストで精度を高めることができることが示されています。結果は、COVAが事前トレーニング済みモデルの精度を平均21％自動的に向上させることができることを示しています。

Judging by popular and generic computer vision challenges, such as the ImageNet or PASCAL VOC, neural networks have proven to be exceptionally accurate in recognition tasks. However, state-of-the-art accuracy often comes at a high computational price, requiring hardware acceleration to achieve real-time performance, while use cases, such as smart cities, require images from fixed cameras to be analyzed in real-time. Due to the amount of network bandwidth these streams would generate, we cannot rely on offloading compute to a centralized cloud. Thus, a distributed edge cloud is expected to process images locally. However, the edge is, by nature, resource-constrained, which puts a limit on the computational complexity that can execute. Yet, there is a need for a meeting point between the edge and accurate real-time video analytics. Specializing lightweight models on a per-camera basis may help but it quickly becomes unfeasible as the number of cameras grows unless the process is automated. In this paper, we present and evaluate COVA (Contextually Optimized Video Analytics), a framework to assist in the automatic specialization of models for video analytics in edge cameras. COVA automatically improves the accuracy of lightweight models through their specialization. Moreover, we discuss and review each step involved in the process to understand the different trade-offs that each one entails. Additionally, we show how the sole assumption of static cameras allows us to make a series of considerations that greatly simplify the scope of the problem. Finally, experiments show that state-of-the-art models, i.e., able to generalize to unseen environments, can be effectively used as teachers to tailor smaller networks to a specific context, boosting accuracy at a constant computational cost. Results show that our COVA can automatically improve accuracy of pre-trained models by an average of 21%.

updated: Mon Dec 13 2021 10:22:05 GMT+0000 (UTC)

published: Wed Apr 14 2021 12:57:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト