ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning

Jia Cheng Hu; Roberto Cavicchioli; Alessandro Capotondi

ExpansionNet v2: 画像キャプションの高速エンドツーエンドトレーニングで静的拡張をブロック

拡張メソッドは、ディープラーニングメソッドの入力長におけるパフォーマンスボトルネックの可能性を調査します。この作業では、入力と比較して異なる長さによって特徴付けられる異種の任意の大きなシーケンスのコレクションに入力を分散および処理するブロック静的拡張を紹介します。この方法を採用して、ExpansionNet v2 と呼ばれるモデルを導入します。これは、新しいトレーニング戦略を使用してトレーニングされ、効果的であるだけでなく、画像キャプションの最近の作業の標準的なアプローチと比較して 6 倍高速になるように設計されています。このモデルは、オフラインテストスプリットで 143.7 CIDEr-D、オンライン評価サーバーで 140.8 CIDEr-D、nocaps 検証セットで 72.9 All-CIDEr のスコアで、MS-COCO 2014 キャプションチャレンジで最先端のパフォーマンスを達成しています。ソースコードは https://github.com/jchenghu/ExpansionNet_v2 で入手可能

Expansion methods explore the possibility of performance bottlenecks in the input length in Deep Learning methods. In this work, we introduce the Block Static Expansion which distributes and processes the input over a heterogeneous and arbitrarily big collection of sequences characterized by a different length compared to the input one. Adopting this method we introduce a model called ExpansionNet v2, which is trained using our novel training strategy, designed to be not only effective but also 6 times faster compared to the standard approach of recent works in Image Captioning. The model achieves the state of art performance over the MS-COCO 2014 captioning challenge with a score of 143.7 CIDEr-D in the offline test split, 140.8 CIDEr-D in the online evaluation server and 72.9 All-CIDEr on the nocaps validation set. Source code available at: https://github.com/jchenghu/ExpansionNet_v2

updated: Fri Aug 19 2022 19:44:50 GMT+0000 (UTC)

published: Sat Aug 13 2022 02:50:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト