Globetrotter: Connecting Languages by Connecting Images

Dídac Surís; Dave Epstein; Carl Vondrick

Globetrotter：画像を接続して言語を接続する

グラウンドトゥルースを使用したトレーニングでは、すべての言語ペア間の監視が必要であり、取得が難しいため、一度に多くの言語間で機械翻訳を行うことは非常に困難です。私たちの重要な洞察は、言語は大幅に異なる可能性がありますが、世界の根底にある視覚的外観は一貫しているということです。並列コーパスや表現の位相特性に依存するのではなく、視覚的観察を使用して言語間のギャップを埋める方法を紹介します。異なる言語のテキストのセグメントを整列させるモデルをトレーニングするのは、それらに関連付けられた画像が類似しており、各画像がそのテキストの説明と適切に整列している場合のみです。 50以上の言語のテキストの新しいデータセットとそれに付随する画像を使用して、モデルを最初からトレーニングします。実験は、私たちの方法が、検索を使用した教師なし単語および文の翻訳に関する以前の作業よりも優れていることを示しています。

Machine translation between many languages at once is highly challenging, since training with ground truth requires supervision between all language pairs, which is difficult to obtain. Our key insight is that, while languages may vary drastically, the underlying visual appearance of the world remains consistent. We introduce a method that uses visual observations to bridge the gap between languages, rather than relying on parallel corpora or topological properties of the representations. We train a model that aligns segments of text from different languages if and only if the images associated with them are similar and each image in turn is well-aligned with its textual description. We train our model from scratch on a new dataset of text in over fifty languages with accompanying images. Experiments show that our method outperforms previous work on unsupervised word and sentence translation using retrieval.

updated: Thu Mar 17 2022 22:37:07 GMT+0000 (UTC)

published: Tue Dec 08 2020 18:50:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト