GeoCLR: Georeference Contrastive Learning for Efficient Seafloor Image Interpretation

Takaki Yamada; Adam Prügel-Bennett; Stefan B. Williams; Oscar Pizarro; Blair Thornton

GeoCLR：効率的な海底画像解釈のための地理参照対照学習

このホワイトペーパーでは、深層学習畳み込みニューラルネットワーク（CNN）の効率的なトレーニングのための視覚表現の地理参照対照学習（GeoCLR）について説明します。この方法では、近くの場所で撮影された画像を使用して類似の画像ペアを生成し、これらを遠く離れた画像ペアと対比することで、地理参照情報を活用します。根本的な仮定は、近距離で収集された画像は同様の視覚的外観を持つ可能性が高いということです。これは、画像のフットプリントが数メートルのエッジ長に制限され、重なり合うように撮影される海底ロボットイメージングアプリケーションで合理的に満たすことができます。海底の基質と生息地はパッチサイズがはるかに大きいのに対し、車両の軌道に沿って。この方法の主な利点は、自己監視型であり、CNNトレーニングに人間の入力を必要としないことです。この方法は計算効率が高く、ほとんどの海洋フィールドトライアルでアクセスできる計算リソースを使用して、数日間のAUVミッション中にダイビングの合間に結果を生成できます。自律型無人潜水機（AUV）を使用して収集された約86,000枚の画像で構成されるデータセットの生息地分類に、GeoCLRを適用します。 GeoCLRによって生成された潜在的表現を使用して、人間の注釈作業を効率的にガイドする方法を示します。半教師ありフレームワークは、同じCNNを使用した最先端の転送学習と比較して分類精度を平均11.8％向上させます。トレーニング用の同等の数の人間の注釈。

This paper describes Georeference Contrastive Learning of visual Representation (GeoCLR) for efficient training of deep-learning Convolutional Neural Networks (CNNs). The method leverages georeference information by generating a similar image pair using images taken of nearby locations, and contrasting these with an image pair that is far apart. The underlying assumption is that images gathered within a close distance are more likely to have similar visual appearance, where this can be reasonably satisfied in seafloor robotic imaging applications where image footprints are limited to edge lengths of a few metres and are taken so that they overlap along a vehicle's trajectory, whereas seafloor substrates and habitats have patch sizes that are far larger. A key advantage of this method is that it is self-supervised and does not require any human input for CNN training. The method is computationally efficient, where results can be generated between dives during multi-day AUV missions using computational resources that would be accessible during most oceanic field trials. We apply GeoCLR to habitat classification on a dataset that consists of ~86k images gathered using an Autonomous Underwater Vehicle (AUV). We demonstrate how the latent representations generated by GeoCLR can be used to efficiently guide human annotation efforts, where the semi-supervised framework improves classification accuracy by an average of 11.8 % compared to state-of-the-art transfer learning using the same CNN and equivalent number of human annotations for training.

updated: Fri Aug 13 2021 22:42:34 GMT+0000 (UTC)

published: Fri Aug 13 2021 22:42:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト