FrameHopper: Selective Processing of Video Frames in Detection-driven Real-Time Video Analytics

Md Adnan Arefeen; Sumaiya Tabassum Nimi; Md Yusuf Sarwar Uddin

FrameHopper：検出駆動型リアルタイムビデオ分析におけるビデオフレームの選択的処理

検出主導のリアルタイムビデオ分析では、YOLOV3、EfficientDetなどのディープラーニングモデルを使用して、ビデオフレームに含まれるオブジェクトを継続的に検出する必要があります。ただし、リソースに制約のあるエッジデバイスのすべてのフレームでこれらの検出器を実行すると、計算量が多くなります。連続するビデオフレーム間の時間的相関を考慮に入れることにより、検出出力が連続するフレームでオーバーラップする傾向があることに注意してください。同様の連続するフレームを排除すると、パフォーマンスの低下はごくわずかになりますが、全体的な計算と通信のコストが削減されるため、パフォーマンスが大幅に向上します。したがって、重要な技術的な質問は、（a）オブジェクト検出器によって処理されるフレームを特定する方法、および（b）処理するフレームが選択された後にスキップできる連続するフレームの数（スキップ長と呼ばれる）です。プロセスの全体的な目標は、フレームをスキップすることによるエラーを可能な限り小さく保つことです。エラー率とフレームフィルタリングの割合のバランスをとるオブジェクト検出タスクに関して、新しいエラー対処理率の最適化問題を紹介します。続いて、オフライン強化学習（RL）ベースのアルゴリズムを提案し、記録されたビデオからRLエージェントの状態アクションポリシーとしてこれらのスキップ長を決定し、ライブビデオストリーム用にエージェントをオンラインで展開します。この目的のために、エッジクラウドのコラボレーションビデオ分析フレームワークであるFrameHopperを開発します。このフレームワークは、カメラで軽量のトレーニング済みRLエージェントを実行し、フィルター処理されたフレームをサーバーに渡します。サーバーでは、一連のアプリケーションに対してオブジェクト検出モデルが実行されます。実際のシナリオからキャプチャされた多数のライブビデオでアプローチをテストし、FrameHopperが処理するフレームはほんの一握りですが、オラクルソリューションに近い検出結果を生成し、ほとんどの場合、最新のソリューションよりも優れていることを示しています。。

Detection-driven real-time video analytics require continuous detection of objects contained in the video frames using deep learning models like YOLOV3, EfficientDet. However, running these detectors on each and every frame in resource-constrained edge devices is computationally intensive. By taking the temporal correlation between consecutive video frames into account, we note that detection outputs tend to be overlapping in successive frames. Elimination of similar consecutive frames will lead to a negligible drop in performance while offering significant performance benefits by reducing overall computation and communication costs. The key technical questions are, therefore, (a) how to identify which frames to be processed by the object detector, and (b) how many successive frames can be skipped (called skip-length) once a frame is selected to be processed. The overall goal of the process is to keep the error due to skipping frames as small as possible. We introduce a novel error vs processing rate optimization problem with respect to the object detection task that balances between the error rate and the fraction of frames filtering. Subsequently, we propose an off-line Reinforcement Learning (RL)-based algorithm to determine these skip-lengths as a state-action policy of the RL agent from a recorded video and then deploy the agent online for live video streams. To this end, we develop FrameHopper, an edge-cloud collaborative video analytics framework, that runs a lightweight trained RL agent on the camera and passes filtered frames to the server where the object detection model runs for a set of applications. We have tested our approach on a number of live videos captured from real-life scenarios and show that FrameHopper processes only a handful of frames but produces detection results closer to the oracle solution and outperforms recent state-of-the-art solutions in most cases.

updated: Tue Mar 22 2022 07:05:57 GMT+0000 (UTC)

published: Tue Mar 22 2022 07:05:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト