Multimodal Representation Learning With Text and Images

Aishwarya Jayagopal; Ankireddy Monica Aiswarya; Ankita Garg; Srinivasan Kolumam Nandakumar

テキストと画像を使用したマルチモーダル表現学習

近年、マルチモーダルAIは、研究者がテキスト、画像、音声などのさまざまなタイプのデータをモデリングに統合して最良の結果を得るようになっているため、上昇傾向にあります。このプロジェクトでは、マルチモーダルAIと行列因数分解手法を活用して、テキストデータと画像データを同時に表現学習し、自然言語処理（NLP）とコンピュータービジョンの広く使用されている手法を採用しています。学習した表現は、ダウンストリームの分類および回帰タスクを使用して評価されます。採用された方法論は、教師なし表現学習にオートエンコーダーを使用するため、このプロジェクトの範囲を超えて拡張できます。

In recent years, multimodal AI has seen an upward trend as researchers are integrating data of different types such as text, images, speech into modelling to get the best results. This project leverages multimodal AI and matrix factorization techniques for representation learning, on text and image data simultaneously, thereby employing the widely used techniques of Natural Language Processing (NLP) and Computer Vision. The learnt representations are evaluated using downstream classification and regression tasks. The methodology adopted can be extended beyond the scope of this project as it uses Auto-Encoders for unsupervised representation learning.

updated: Sat Apr 30 2022 03:25:01 GMT+0000 (UTC)

published: Sat Apr 30 2022 03:25:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト