Ensembles of Vision Transformers as a New Paradigm for Automated Classification in Ecology

S. Kyathanahally; T. Hardeman; M. Reyes; E. Merz; T. Bulas; F. Pomati; M. Baity-Jesi

生態学における自動分類のための新しいパラダイムとしてのビジョントランスフォーマーのアンサンブル

生物多様性の監視は、特に地球規模の変化の時代に、天然資源を管理および保護するために最も重要です。大規模な時間的または空間的スケールで生物の画像を収集することは、自然生態系の生物多様性の変化を監視および研究するための有望な手法であり、環境への干渉を最小限に抑えて大量のデータを提供します。深層学習モデルは現在、生物の分類単位への分類を自動化するために使用されています。ただし、これらの分類器の不正確さは、制御が困難な測定ノイズをもたらし、データの分析と解釈を大幅に妨げる可能性があります。私たちの研究では、この制限は、以前の最先端技術（SOTA）を大幅に上回るデータ効率の高いイメージトランスフォーマー（DeiT）のアンサンブルによって克服できることを示しています。さまざまな起源の多数の生態学的イメージングデータセット、およびプランクトンから昆虫、鳥、犬種、野生動物、サンゴに至るまでの研究生物について、結果を検証します。テストするすべてのデータセットで、新しいSOTAを実現し、データセットに応じて以前のSOTAと比較してエラーを18.48％から87.50％の範囲で削減し、多くの場合、完全な分類に非常に近いパフォーマンスを実現します。 DeiTのアンサンブルのパフォーマンスが向上する主な理由は、DeiTの単一モデルのパフォーマンスによるものではなく、独立したモデルによる予測のオーバーラップが小さいため、アンサンブルによって得られる利益が最大化されるためです。これにより、DeiTアンサンブルは生物多様性モニタリングにおける画像分類の最良の候補として位置付けられます。

Monitoring biodiversity is paramount to manage and protect natural resources, particularly in times of global change. Collecting images of organisms over large temporal or spatial scales is a promising practice to monitor and study biodiversity change of natural ecosystems, providing large amounts of data with minimal interference with the environment. Deep learning models are currently used to automate classification of organisms into taxonomic units. However, imprecision in these classifiers introduce a measurement noise that is difficult to control and can significantly hinder the analysis and interpretation of data. In our study, we show that this limitation can be overcome by ensembles of Data-efficient image Transformers (DeiTs), which significantly outperform the previous state of the art (SOTA). We validate our results on a large number of ecological imaging datasets of diverse origin, and organisms of study ranging from plankton to insects, birds, dog breeds, animals in the wild, and corals. On all the data sets we test, we achieve a new SOTA, with a reduction of the error with respect to the previous SOTA ranging from 18.48% to 87.50%, depending on the data set, and often achieving performances very close to perfect classification. The main reason why ensembles of DeiTs perform better is not due to the single-model performance of DeiTs, but rather to the fact that predictions by independent models have a smaller overlap, and this maximizes the profit gained by ensembling. This positions DeiT ensembles as the best candidate for image classification in biodiversity monitoring.

updated: Thu Mar 03 2022 14:16:22 GMT+0000 (UTC)

published: Thu Mar 03 2022 14:16:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト