Fair Comparison: Quantifying Variance in Resultsfor Fine-grained Visual Categorization

Matthew Gwilliam; Adam Teuscher; Connor Anderson; Ryan Farrell

公正な比較：きめ細かい視覚的分類の結果の分散の定量化

画像分類のタスクのために、研究者は次の最先端の（SOTA）モデルを開発するために熱心に取り組んでおり、それぞれが前任者や同業者のパフォーマンスに対して独自のパフォーマンスをベンチマークしています。残念ながら、モデルのパフォーマンスを説明するために最も頻繁に使用されるメトリック、平均分類精度は、単独で使用されることがよくあります。きめ細かい視覚分類（FGVC）のようにクラスの数が増えると、平均的な精度だけで伝達される情報の量は減少します。その最も明白な弱点は、クラスごとにモデルのパフォーマンスを説明できないことですが、平均精度は、同じデータセット上の同じアーキテクチャのトレーニング済みモデルごとにパフォーマンスがどのように異なるかを説明することもできません（両方すべてのカテゴリおよびクラスごとのレベルで平均）。最初に、データの属性に基づいて、モデル間およびクラス分布全体でのこれらの変動の大きさを示し、ロングテール分布および少数ショットサブセットを含む、さまざまな視覚ドメインおよびクラスごとのさまざまな画像分布での結果を比較します。次に、さまざまなFGVCメソッドが全体およびクラスごとの分散に与える影響を分析します。この分析から、全体的な精度を超えた情報に基づいて方法を報告および比較することの重要性を強調し、FGVC結果の変動を軽減する手法を指摘します。

For the task of image classification, researchers work arduously to develop the next state-of-the-art (SOTA) model, each bench-marking their own performance against that of their predecessors and of their peers. Unfortunately, the metric used most frequently to describe a model's performance, average categorization accuracy, is often used in isolation. As the number of classes increases, such as in fine-grained visual categorization (FGVC), the amount of information conveyed by average accuracy alone dwindles. While its most glaring weakness is its failure to describe the model's performance on a class-by-class basis, average accuracy also fails to describe how performance may vary from one trained model of the same architecture, on the same dataset, to another (both averaged across all categories and at the per-class level). We first demonstrate the magnitude of these variations across models and across class distributions based on attributes of the data, comparing results on different visual domains and different per-class image distributions, including long-tailed distributions and few-shot subsets. We then analyze the impact various FGVC methods have on overall and per-class variance. From this analysis, we both highlight the importance of reporting and comparing methods based on information beyond overall accuracy, as well as point out techniques that mitigate variance in FGVC results.

updated: Tue Sep 07 2021 15:47:27 GMT+0000 (UTC)

published: Tue Sep 07 2021 15:47:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト