I2V-GAN: Unpaired Infrared-to-Visible Video Translation

Shuang Li; Bingfeng Han; Zhenjie Yu; Chi Harold Liu; Kai Chen; Shuigen Wang

I2V-GAN：対になっていない赤外線から可視へのビデオ変換

人間の視力は、特に暗視のシナリオでは、複雑な環境要因によって悪影響を受けることがよくあります。したがって、赤外線カメラは、周囲の環境で赤外線を検出することで視覚効果を高めるために利用されることがよくありますが、詳細なセマンティック情報が不足しているため、赤外線ビデオは望ましくありません。このような場合、赤外線領域から可視光領域への効果的なビデオからビデオへの変換方法は、赤外線フィールドと可視フィールドの間の固有の巨大なギャップを克服することによって強く必要とされます。この困難な問題に対処するために、赤外線から可視（I2V）ビデオ変換方法I2V-GANを提案し、対になっていない赤外線ビデオを指定して、きめが細かく時空間的に一貫した可視光ビデオを生成します。技術的には、私たちのモデルは3つのタイプの制約を利用しています：1）実際のフレームに類似した合成フレームを生成するための敵対的制約、2）効果的なコンテンツ変換とスタイル保存のために導入された知覚損失との周期的一貫性、3）類似性ドメイン全体およびドメイン内の制約により、空間空間と時間空間の両方でコンテンツとモーションの一貫性をきめ細かく強化します。さらに、現在公開されている赤外線および可視光のデータセットは、主にオブジェクトの検出または追跡に使用され、一部はビデオタスクに適さない不連続な画像で構成されています。したがって、IRVIという名前のI2Vビデオ変換用の新しいデータセットを提供します。具体的には、車両と監視シーンの12の連続したビデオクリップがあり、赤外線ビデオと可視光ビデオの両方が24352フレームに分割される可能性があります。包括的な実験により、I2V-GANは、より流暢でより細かいセマンティックの詳細を備えたI2Vビデオの翻訳において、比較されたSOTAメソッドよりも優れていることが検証されます。コードとIRVIデータセットは、https：//github.com/BIT-DA/I2V-GANで入手できます。

Human vision is often adversely affected by complex environmental factors, especially in night vision scenarios. Thus, infrared cameras are often leveraged to help enhance the visual effects via detecting infrared radiation in the surrounding environment, but the infrared videos are undesirable due to the lack of detailed semantic information. In such a case, an effective video-to-video translation method from the infrared domain to the visible light counterpart is strongly needed by overcoming the intrinsic huge gap between infrared and visible fields. To address this challenging problem, we propose an infrared-to-visible (I2V) video translation method I2V-GAN to generate fine-grained and spatial-temporal consistent visible light videos by given unpaired infrared videos. Technically, our model capitalizes on three types of constraints: 1)adversarial constraint to generate synthetic frames that are similar to the real ones, 2)cyclic consistency with the introduced perceptual loss for effective content conversion as well as style preservation, and 3)similarity constraints across and within domains to enhance the content and motion consistency in both spatial and temporal spaces at a fine-grained level. Furthermore, the current public available infrared and visible light datasets are mainly used for object detection or tracking, and some are composed of discontinuous images which are not suitable for video tasks. Thus, we provide a new dataset for I2V video translation, which is named IRVI. Specifically, it has 12 consecutive video clips of vehicle and monitoring scenes, and both infrared and visible light videos could be apart into 24352 frames. Comprehensive experiments validate that I2V-GAN is superior to the compared SOTA methods in the translation of I2V videos with higher fluency and finer semantic details. The code and IRVI dataset are available at https://github.com/BIT-DA/I2V-GAN.

updated: Mon Aug 02 2021 14:04:19 GMT+0000 (UTC)

published: Mon Aug 02 2021 14:04:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト