Wireless Deep Video Semantic Transmission

Sixian Wang; Jincheng Dai; Zijian Liang; Kai Niu; Zhongwei Si; Chao Dong; Xiaoqi Qin; Ping Zhang

ワイヤレスディープビデオセマンティック伝送

この論文では、無線チャネルを介したエンドツーエンドのビデオ伝送を実現するために、新しいクラスの高効率ディープジョイントソースチャネルコーディング方法を設計します。提案された方法は、非線形変換と条件付きコーディングアーキテクチャを活用して、ビデオフレーム全体でセマンティック機能を適応的に抽出し、ディープジョイントソースチャネルコーディングを介してワイヤレスチャネルを介してセマンティック機能ドメイン表現を送信します。私たちのフレームワークは、ディープビデオセマンティックトランスミッション（DVST）という名前で収集されています。特に、特徴領域コンテキストによって提供される強力な時間的事前分布の恩恵を受けて、学習された非線形変換関数は時間的に適応可能になり、現在のフレームの送信をガイドするより豊かでより正確なエントロピーモデルをもたらします。したがって、ビデオソースのディープジョイントソースチャネルコーディングをカスタマイズするために、新しいレート適応伝送メカニズムが開発されています。全体的な伝送パフォーマンスを最大化するために、ビデオフレーム内およびビデオフレーム間で制限されたチャネル帯域幅を割り当てることを学習します。 DVST設計全体は、知覚品質メトリックまたはマシンビジョンタスクパフォーマンスメトリックの下でエンドツーエンドの伝送レート歪みパフォーマンスを最小化することを目的とした最適化問題として定式化されます。標準のビデオソーステストシーケンスとさまざまな通信シナリオで、実験により、DVSTは一般に従来のワイヤレスビデオコード化伝送方式を超えることができることが示されています。提案されたDVSTフレームワークは、ビデオコンテンツ対応およびマシンビジョンタスク統合機能により、将来のセマンティック通信を十分にサポートできます。

In this paper, we design a new class of high-efficiency deep joint source-channel coding methods to achieve end-to-end video transmission over wireless channels. The proposed methods exploit nonlinear transform and conditional coding architecture to adaptively extract semantic features across video frames, and transmit semantic feature domain representations over wireless channels via deep joint source-channel coding. Our framework is collected under the name deep video semantic transmission (DVST). In particular, benefiting from the strong temporal prior provided by the feature domain context, the learned nonlinear transform function becomes temporally adaptive, resulting in a richer and more accurate entropy model guiding the transmission of current frame. Accordingly, a novel rate adaptive transmission mechanism is developed to customize deep joint source-channel coding for video sources. It learns to allocate the limited channel bandwidth within and among video frames to maximize the overall transmission performance. The whole DVST design is formulated as an optimization problem whose goal is to minimize the end-to-end transmission rate-distortion performance under perceptual quality metrics or machine vision task performance metrics. Across standard video source test sequences and various communication scenarios, experiments show that our DVST can generally surpass traditional wireless video coded transmission schemes. The proposed DVST framework can well support future semantic communications due to its video content-aware and machine vision task integration abilities.

updated: Wed Nov 02 2022 06:36:56 GMT+0000 (UTC)

published: Thu May 26 2022 03:26:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト