Image Captioning with Sparse Recurrent Neural Network

Jia Huei Tan; Chee Seng Chan; Joon Huang Chuah

スパースリカレントニューラルネットワークによる画像キャプション

リカレントニューラルネットワーク（RNN）は、さまざまな言語生成の問題に取り組むために広く使用されており、最先端（SOTA）のパフォーマンスを達成できます。しかし、その印象的な結果にもかかわらず、RNNモデルの多数のパラメーターは、モバイルおよび組み込みデバイスへの展開を実行不可能にします。この問題に駆り立てられて、多くの研究がRNNモデルのサイズを縮小するためのいくつかの枝刈り方法を提案しています。この作業では、視覚的注意を備えた画像キャプションモデルのエンドツーエンドの枝刈り方法を提案します。提案された方法は、ベースラインに比べて大幅なパフォーマンスの低下なしに最大97.5％のスパースレベルを達成できます（微調整後、40倍の圧縮で2％の損失）また、この方法は、使用と調整が簡単で、ニューラルネットワークの実践者の開発時間を短縮できます。提案された方法の有効性を実験的に検証するために、一般的なMS-COCOデータセットで広範な実験を行います。

Recurrent Neural Network (RNN) has been widely used to tackle a wide variety of language generation problems and are capable of attaining state-of-the-art (SOTA) performance. However despite its impressive results, the large number of parameters in the RNN model makes deployment to mobile and embedded devices infeasible. Driven by this problem, many works have proposed a number of pruning methods to reduce the sizes of the RNN model. In this work, we propose an end-to-end pruning method for image captioning models equipped with visual attention. Our proposed method is able to achieve sparsity levels up to 97.5% without significant performance loss relative to the baseline (~ 2% loss at 40x compression after fine-tuning). Our method is also simple to use and tune, facilitating faster development times for neural network practitioners. We perform extensive experiments on the popular MS-COCO dataset in order to empirically validate the efficacy of our proposed method.

updated: Mon Oct 28 2019 15:51:13 GMT+0000 (UTC)

published: Wed Aug 28 2019 15:53:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト