VieCap4H-VLSP 2021: ObjectAoA -- Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning

Nghia Hieu Nguyen; Duong T. D. Vo; Minh-Quan Ha

VieCap4H-VLSP 2021: ObjectAoA -- ベトナム語の画像キャプションに注意を向けて Object Relation Transformer のパフォーマンスを向上

画像キャプションは現在、視覚情報を理解し、人間の言語を使用して画像内のこの視覚情報を説明する能力を必要とする困難なタスクです。この論文では、Attention on Attentionメカニズムを使用してObject Relation Transformerアーキテクチャを拡張することにより、Transformerベースの方法の画像理解能力を向上させる効率的な方法を提案します。 VieCap4H データセットの実験では、VLSP が保持する画像キャプション共有タスクの公開テストと非公開テストの両方で、提案された方法が元の構造よりも大幅に優れていることが示されています。

Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experiments on the VieCap4H dataset show that our proposed method significantly outperforms its original structure on both the public test and private test of the Image Captioning shared task held by VLSP.

updated: Sat Nov 12 2022 08:10:13 GMT+0000 (UTC)

published: Thu Nov 10 2022 08:19:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト