Rethinking Recurrent Neural Networks and Other Improvements for Image Classification

Nguyen Huu Phong; Bernardete Ribeiro

リカレントニューラルネットワークの再考と画像分類のための他の改善

数十年前にさかのぼる機械学習の長い歴史にわたって、リカレントニューラルネットワーク（RNN）は、主にシーケンシャルデータと時系列に、一般的には1D情報とともに使用されてきました。 2D画像に関するいくつかのまれな研究でも、これらのネットワークは、画像認識タスクではなく、データを順次学習および生成するためにのみ使用されます。この研究では、画像認識モデルを設計する際の追加レイヤーとしてRNNを統合することを提案します。また、いくつかのモデルを使用して専門家の予測を生成するエンドツーエンドのマルチモデルアンサンブルを開発します。さらに、トレーニング戦略を拡張して、モデルが主要なモデルと同等に機能し、いくつかの困難なデータセット（SVHN（0.99）、Cifar-100（0.9027）、Cifarなど）の最先端のモデルと一致するようにします。 -10（0.9852））。さらに、私たちのモデルは、Surreyデータセット（0.949）に新しい記録を打ち立てました。この記事で提供されているメソッドのソースコードは、https：//github.com/leonlha/e2e-3mおよびhttp://nguyenhuuphong.meで入手できます。

Over the long history of machine learning, which dates back several decades, recurrent neural networks (RNNs) have been used mainly for sequential data and time series and generally with 1D information. Even in some rare studies on 2D images, these networks are used merely to learn and generate data sequentially rather than for image recognition tasks. In this study, we propose integrating an RNN as an additional layer when designing image recognition models. We also develop end-to-end multimodel ensembles that produce expert predictions using several models. In addition, we extend the training strategy so that our model performs comparably to leading models and can even match the state-of-the-art models on several challenging datasets (e.g., SVHN (0.99), Cifar-100 (0.9027) and Cifar-10 (0.9852)). Moreover, our model sets a new record on the Surrey dataset (0.949). The source code of the methods provided in this article is available at https://github.com/leonlha/e2e-3m and http://nguyenhuuphong.me.

updated: Thu Mar 04 2021 04:21:48 GMT+0000 (UTC)

published: Thu Jul 30 2020 00:40:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト