Class-Difficulty Based Methods for Long-Tailed Visual Recognition

Saptarshi Sinha; Hiroki Ohashi; Katsuyuki Nakamura

ロングテール視覚認識のためのクラス難易度ベースの方法

ロングテールデータセットは、ほとんどのクラスまたはカテゴリ (マジョリティクラスまたはヘッドクラスと呼ばれる) が他のクラス (マイノリティクラスまたはテールクラスと呼ばれる) と比較してデータサンプルの数が多い実世界のユースケースで非常に頻繁に遭遇します。このようなデータセットでディープニューラルネットワークをトレーニングすると、ヘッドクラスに偏った結果が得られます。これまでのところ、研究者はバイアスを減らすために、複数の加重損失とデータの再サンプリング手法を考え出しています。ただし、そのような手法のほとんどは、末尾のクラスが常に最も学習が難しいクラスであり、そのため、より多くの重み付けまたは注意が必要であると想定しています。ここで、仮定が常に成り立つとは限らないことを主張します。したがって、モデルのトレーニング段階で各クラスの瞬間的な難易度を動的に測定する新しいアプローチを提案します。さらに、各クラスの難易度測定値を使用して、「クラスごとの難易度に基づく加重 (CDB-W) 損失」と呼ばれる新しい加重損失手法と、「クラスごとの難易度に基づくサンプリング (CDB-S)」と呼ばれる新しいデータサンプリング手法を設計します。）」。 CDB メソッドの広範な有用性を検証するために、画像分類、オブジェクト検出、インスタンスセグメンテーション、ビデオアクション分類などの複数のタスクについて広範な実験を実施しました。結果は、CDB-W 損失と CDB-S が、ImageNet-LT、LVIS、EGTEA など、実際のユースケースに似た多くのクラスが不均衡なデータセットで最先端の結果を達成できることを確認しました。

Long-tailed datasets are very frequently encountered in real-world use cases where few classes or categories (known as majority or head classes) have higher number of data samples compared to the other classes (known as minority or tail classes). Training deep neural networks on such datasets gives results biased towards the head classes. So far, researchers have come up with multiple weighted loss and data re-sampling techniques in efforts to reduce the bias. However, most of such techniques assume that the tail classes are always the most difficult classes to learn and therefore need more weightage or attention. Here, we argue that the assumption might not always hold true. Therefore, we propose a novel approach to dynamically measure the instantaneous difficulty of each class during the training phase of the model. Further, we use the difficulty measures of each class to design a novel weighted loss technique called `class-wise difficulty based weighted (CDB-W) loss' and a novel data sampling technique called `class-wise difficulty based sampling (CDB-S)'. To verify the wide-scale usability of our CDB methods, we conducted extensive experiments on multiple tasks such as image classification, object detection, instance segmentation and video-action classification. Results verified that CDB-W loss and CDB-S could achieve state-of-the-art results on many class-imbalanced datasets such as ImageNet-LT, LVIS and EGTEA, that resemble real-world use cases.

updated: Mon Aug 22 2022 06:54:24 GMT+0000 (UTC)

published: Fri Jul 29 2022 06:33:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト