Semi-Autoregressive Image Captioning

Xu Yan; Zhengcong Fei; Zekang Li; Shuhui Wang; Qingming Huang; Qi Tian

半自己回帰画像キャプション

画像キャプションの現在の最先端のアプローチは、通常、自己回帰方式を採用しています。つまり、単語ごとに説明を生成します。これは、デコードの問題が遅く、リアルタイムアプリケーションのボトルネックになります。文生成における順次依存性を排除する、継続的な反復改良を伴う非自己回帰画像キャプションは、かなりの加速で自己回帰対応物と同等のパフォーマンスを達成することができます。それでも、適切に設計された実験に基づいて、言語デコーダーに十分な事前知識を提供すると、反復時間を効果的に短縮できることを経験的に証明しました。そのために、パフォーマンスと速度のトレードオフを改善するために、半自己回帰画像キャプション（SAIC）と呼ばれる新しい2段階のフレームワークを提案します。提案されたSAICモデルは、グローバルで自己回帰プロパティを維持しますが、ローカルではそれを軽減します。具体的には、SAICモデルは、最初に自己回帰方式で断続的なシーケンスをジャンプ的に生成します。つまり、すべての単語グループの最初の単語を順番に予測します。次に、部分的に決定論的な事前情報と画像機能の助けを借りて、SAICモデルはスキップされたすべての単語を1回の反復で非自動回帰的に埋めます。 MS COCOベンチマークの実験結果は、SAICモデルが、競合する推論の高速化を実現しながら、前述の非自己回帰画像キャプションモデルよりも優れていることを示しています。コードはhttps://github.com/feizc/SAICで入手できます。

Current state-of-the-art approaches for image captioning typically adopt an autoregressive manner, i.e., generating descriptions word by word, which suffers from slow decoding issue and becomes a bottleneck in real-time applications. Non-autoregressive image captioning with continuous iterative refinement, which eliminates the sequential dependence in a sentence generation, can achieve comparable performance to the autoregressive counterparts with a considerable acceleration. Nevertheless, based on a well-designed experiment, we empirically proved that iteration times can be effectively reduced when providing sufficient prior knowledge for the language decoder. Towards that end, we propose a novel two-stage framework, referred to as Semi-Autoregressive Image Captioning (SAIC), to make a better trade-off between performance and speed. The proposed SAIC model maintains autoregressive property in global but relieves it in local. Specifically, SAIC model first jumpily generates an intermittent sequence in an autoregressive manner, that is, it predicts the first word in every word group in order. Then, with the help of the partially deterministic prior information and image features, SAIC model non-autoregressively fills all the skipped words with one iteration. Experimental results on the MS COCO benchmark demonstrate that our SAIC model outperforms the preceding non-autoregressive image captioning models while obtaining a competitive inference speedup. Code is available at https://github.com/feizc/SAIC.

updated: Mon Oct 11 2021 15:11:54 GMT+0000 (UTC)

published: Mon Oct 11 2021 15:11:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト