VALHALLA: Visual Hallucination for Machine Translation

Yi Li; Rameswar Panda; Yoon Kim; Chun-Fu; Chen; Rogerio Feris; David Cox; Nuno Vasconcelos

VALHALLA：機械翻訳のための視覚的幻覚

近年、画像などの補助入力を考慮した、より優れた機械翻訳システムの設計が注目されています。既存の方法は、従来のテキストのみの翻訳システムに比べて有望なパフォーマンスを示しますが、通常、推論中の入力としてテキストと画像のペアが必要であるため、実際のシナリオへの適用が制限されます。この論文では、VALHALLAと呼ばれる視覚的幻覚フレームワークを紹介します。これは、推論時にソースセンテンスのみを必要とし、代わりにマルチモーダル機械翻訳に幻覚的視覚表現を使用します。特に、ソースセンテンスが与えられると、自己回帰幻覚トランスフォーマーを使用して入力テキストから個別の視覚的表現を予測し、テキストと幻覚表現を組み合わせてターゲットの翻訳を取得します。クロスエントロピー損失を伴う標準的なバックプロパゲーションを使用して、幻覚トランスフォーマーを変換トランスフォーマーと共同でトレーニングします。また、グラウンドトゥルースまたは幻覚視覚表現のいずれかを使用した予測間の一貫性を促進する追加の損失によってガイドされます。多様な言語ペアのセットを使用した3つの標準翻訳データセットでの広範な実験は、テキストのみのベースラインと最先端の方法の両方に対するアプローチの有効性を示しています。プロジェクトページ：http：//www.svcl.ucsd.edu/projects/valhalla。

Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a visual hallucination framework, called VALHALLA, which requires only source sentences at inference time and instead uses hallucinated visual representations for multimodal machine translation. In particular, given a source sentence an autoregressive hallucination transformer is used to predict a discrete visual representation from the input text, and the combined text and hallucinated representations are utilized to obtain the target translation. We train the hallucination transformer jointly with the translation transformer using standard backpropagation with cross-entropy losses while being guided by an additional loss that encourages consistency between predictions using either ground-truth or hallucinated visual representations. Extensive experiments on three standard translation datasets with a diverse set of language pairs demonstrate the effectiveness of our approach over both text-only baselines and state-of-the-art methods. Project page: http://www.svcl.ucsd.edu/projects/valhalla.

updated: Tue May 31 2022 20:25:15 GMT+0000 (UTC)

published: Tue May 31 2022 20:25:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト