Task-Oriented Communication for Edge Video Analytics

Jiawei Shao; Xinjie Zhang; Jun Zhang

エッジビデオ分析のためのタスク指向コミュニケーション

人工知能 (AI) 技術の開発とカメラ搭載デバイスの人気の高まりに伴い、多くのエッジビデオ分析アプリケーションが出現しており、ネットワークエッジでの計算集約型 AI モデルの展開が求められています。エッジ推論は、計算集約型のワークロードをローエンドデバイスからビデオ分析用の強力なエッジサーバーに移行するための有望なソリューションですが、帯域幅が限られているため、デバイスとサーバー間の通信はボトルネックのままです。このホワイトペーパーでは、エッジビデオ分析用のタスク指向通信フレームワークを提案します。このフレームワークでは、複数のデバイスが視覚的感覚データを収集し、有益な機能をエッジサーバーに送信して処理します。低レイテンシーの推論を可能にするために、このフレームワークは、エッジサーバーでビデオを再構築するのではなく、空間的および時間的ドメインでビデオの冗長性を取り除き、ダウンストリームタスクに不可欠な最小限の情報を送信します。具体的には、決定論的情報ボトルネック (IB) 原則に基づいて、コンパクトなタスク関連機能を抽出します。これは、機能の有益性と通信コストの間のトレードオフを特徴付けます。連続するフレームの特徴は時間的に相関しているため、前の特徴を特徴エンコーディングのサイド情報として取得することでビットレートを削減する時間エントロピーモデル (TEM) を提案します。推論のパフォーマンスをさらに向上させるために、サーバーで時空間融合モジュールを構築し、共同推論のために現在および以前のフレームの機能を統合します。ビデオ分析タスクに関する広範な実験は、提案されたフレームワークがビデオデータのタスク関連情報を効果的にエンコードし、既存の方法よりも優れたレートとパフォーマンスのトレードオフを達成することを証明しています。

With the development of artificial intelligence (AI) techniques and the increasing popularity of camera-equipped devices, many edge video analytics applications are emerging, calling for the deployment of computation-intensive AI models at the network edge. Edge inference is a promising solution to move the computation-intensive workloads from low-end devices to a powerful edge server for video analytics, but the device-server communications will remain a bottleneck due to the limited bandwidth. This paper proposes a task-oriented communication framework for edge video analytics, where multiple devices collect the visual sensory data and transmit the informative features to an edge server for processing. To enable low-latency inference, this framework removes video redundancy in spatial and temporal domains and transmits minimal information that is essential for the downstream task, rather than reconstructing the videos at the edge server. Specifically, it extracts compact task-relevant features based on the deterministic information bottleneck (IB) principle, which characterizes a tradeoff between the informativeness of the features and the communication cost. As the features of consecutive frames are temporally correlated, we propose a temporal entropy model (TEM) to reduce the bitrate by taking the previous features as side information in feature encoding. To further improve the inference performance, we build a spatial-temporal fusion module at the server to integrate features of the current and previous frames for joint inference. Extensive experiments on video analytics tasks evidence that the proposed framework effectively encodes task-relevant information of video data and achieves a better rate-performance tradeoff than existing methods.

updated: Mon Apr 01 2024 14:38:13 GMT+0000 (UTC)

published: Fri Nov 25 2022 12:09:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト