UVid-Net: Enhanced Semantic Segmentation of UAV Aerial Videos by Embedding Temporal Information

Girisha S; Ujjwal Verma; Manohara Pai M M; Radhika Pai

UVid-Net：時間情報を埋め込むことによるUAV空中ビデオの強化されたセマンティックセグメンテーション

空中ビデオのセマンティックセグメンテーションは、環境変化の監視、都市計画、および災害管理における意思決定に広く使用されています。これらの意思決定支援システムの信頼性は、ビデオセマンティックセグメンテーションアルゴリズムの精度に依存しています。既存のＣＮＮベースのビデオセマンティックセグメンテーション方法は、計算オーバーヘッドであるビデオの時間的ダイナミクスを計算するためのＬＳＴＭまたはオプティカルフローなどの追加のモジュールを組み込むことによって、画像セマンティックセグメンテーション方法を強化した。提案された研究作業は、ビデオセマンティックセグメンテーションの効率を改善するために時間情報を組み込むことによってCNNアーキテクチャを変更します。この作業では、UAVビデオセマンティックセグメンテーションのために、拡張されたエンコーダ-デコーダベースのCNNアーキテクチャ（UVid-Net）が提案されています。提案されたアーキテクチャのエンコーダは、時間的に一貫したラベリングのために時間情報を埋め込みます。デコーダーは、クラスラベルの正確なローカリゼーションを支援する機能リファイナーモジュールを導入することで強化されています。 UAVビデオセマンティックセグメンテーション用に提案されたUVid-Netアーキテクチャは、拡張されたManipalUAVidデータセットで定量的に評価されます。 0.79のパフォーマンスメトリックmIoUが観察されました。これは、他の最先端のアルゴリズムよりも大幅に優れています。さらに、提案された作業は、UAV航空ビデオの最終層を微調整することで、都市のストリートシーンで事前にトレーニングされたUVid-Netのモデルに対しても有望な結果を生み出しました。

Semantic segmentation of aerial videos has been extensively used for decision making in monitoring environmental changes, urban planning, and disaster management. The reliability of these decision support systems is dependent on the accuracy of the video semantic segmentation algorithms. The existing CNN based video semantic segmentation methods have enhanced the image semantic segmentation methods by incorporating an additional module such as LSTM or optical flow for computing temporal dynamics of the video which is a computational overhead. The proposed research work modifies the CNN architecture by incorporating temporal information to improve the efficiency of video semantic segmentation. In this work, an enhanced encoder-decoder based CNN architecture (UVid-Net) is proposed for UAV video semantic segmentation. The encoder of the proposed architecture embeds temporal information for temporally consistent labelling. The decoder is enhanced by introducing the feature-refiner module, which aids in accurate localization of the class labels. The proposed UVid-Net architecture for UAV video semantic segmentation is quantitatively evaluated on extended ManipalUAVid dataset. The performance metric mIoU of 0.79 has been observed which is significantly greater than the other state-of-the-art algorithms. Further, the proposed work produced promising results even for the pre-trained model of UVid-Net on urban street scene with fine tuning the final layer on UAV aerial videos.

updated: Thu May 27 2021 13:04:56 GMT+0000 (UTC)

published: Sun Nov 29 2020 05:01:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト