Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan; Andrew Zisserman

大規模画像認識のための非常に深い畳み込みネットワーク

本研究では、大規模画像認識における畳み込みネットワークの深さが精度に与える影響を調査した。我々の主な貢献は、非常に小さな(3x3)畳み込みフィルタを用いたアーキテクチャを用いた深さを増すネットワークの徹底的な評価であり、深さを16～19の重み層に押し上げることで、従来の構成よりも大幅な改善が可能であることを示している。これらの知見は、ImageNet Challenge 2014に参加した際の基礎となったもので、我々ののチームは、定位と分類のトラックでそれぞれ1位と2位を獲得した。また、我々の表現は他のデータセットでも一般化し、最先端の結果が得られることを示している。コンピュータビジョンにおける深層視覚表現の使用に関する研究を促進するために、2つの最高性能のConvNetモデルを公開した。

In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

updated: Fri Apr 10 2015 16:25:04 GMT+0000 (UTC)

published: Thu Sep 04 2014 19:48:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト