Geo-Aware Networks for Fine-Grained Recognition

Grace Chu; Brian Potetz; Weijun Wang; Andrew Howard; Yang Song; Fernando Brucher; Thomas Leung; Hartwig Adam

詳細な認識のための地理認識ネットワーク

きめの細かい認識では、微妙な視覚的な違いがあるカテゴリを区別します。これらの困難な視覚的カテゴリを区別するには、追加情報を活用することが役立ちます。ジオロケーションは、詳細な分類精度を向上させるために使用できる追加情報の豊富なソースですが、十分に研究されていません。この分野への貢献は2つあります。まず、私たちの知る限り、これは、ジオロケーションの事前、後処理、または特徴の変調を使用して、ジオロケーション情報を詳細な画像分類に組み込むさまざまな方法を体系的に検討した最初の論文です。第二に、詳細なデータセットに完全な地理位置情報がない状況を克服するために、既存の一般的なデータセットに補完的な情報を提供することにより、地理位置情報を含む2つの詳細なデータセットをリリースします-iNaturalistとYFCC100M。ジオロケーション情報を活用することにより、強力なベースライン画像のみのモデルで、iNaturalistのトップ1の精度を70.1％から79.0％に改善します。いくつかのモデルを比較すると、ジオロケーションとともに画像のみのベースラインの出力を消費する後処理モデルによって、最高のパフォーマンスが達成されることがわかりました。ただし、リソースが制約されたモデル（MobileNetV2）の場合、ピクセルとジオロケーションを組み合わせてトレーニングする機能変調モデルの方がパフォーマンスが向上しました。精度は59.6％から72.2％に向上しました。私たちの仕事は、サーバーとオンデバイスの両方のきめ細かい認識モデルに地理位置情報を組み込むことを強く主張します。

Fine-grained recognition distinguishes among categories with subtle visual differences. In order to differentiate between these challenging visual categories, it is helpful to leverage additional information. Geolocation is a rich source of additional information that can be used to improve fine-grained classification accuracy, but has been understudied. Our contributions to this field are twofold. First, to the best of our knowledge, this is the first paper which systematically examined various ways of incorporating geolocation information into fine-grained image classification through the use of geolocation priors, post-processing or feature modulation. Secondly, to overcome the situation where no fine-grained dataset has complete geolocation information, we release two fine-grained datasets with geolocation by providing complementary information to existing popular datasets - iNaturalist and YFCC100M. By leveraging geolocation information we improve top-1 accuracy in iNaturalist from 70.1% to 79.0% for a strong baseline image-only model. Comparing several models, we found that best performance was achieved by a post-processing model that consumed the output of the image-only baseline alongside geolocation. However, for a resource-constrained model (MobileNetV2), performance was better with a feature modulation model that trains jointly over pixels and geolocation: accuracy increased from 59.6% to 72.2%. Our work makes a strong case for incorporating geolocation information in fine-grained recognition models for both server and on-device.

updated: Wed Sep 04 2019 21:56:58 GMT+0000 (UTC)

published: Tue Jun 04 2019 21:53:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト