ESPT: A Self-Supervised Episodic Spatial Pretext Task for Improving Few-Shot Learning

Yi Rong; Xiongbo Lu; Zhaoyang Sun; Yaxiong Chen; Shengwu Xiong

ESPT: 少数ショット学習を改善するための自己教師ありエピソード空間プレテキストタスク

最近、自己教師あり学習 (SSL) 手法が少数ショット学習 (FSL) フレームワークに統合され、少数ショット画像分類のパフォーマンスを向上させる有望な結果が示されました。ただし、FSL で使用される既存の SSL アプローチは通常、すべての単一画像のグローバルな埋め込みから監視信号を探します。したがって、FSL のエピソードトレーニング中に、これらのメソッドは、FSL にとって有益な画像サンプルのローカル視覚情報とエピソード全体のデータ構造情報をキャプチャして十分に利用することはできません。この目的のために、新しい自己教師付きエピソード空間プレテキストタスク (ESPT) を使用して、少数ショットの学習目標を強化することを提案します。具体的には、少数ショットのエピソードごとに、その中のすべての画像にランダムな幾何学的変換を適用することにより、対応する変換されたエピソードを生成します。これらに基づいて、ESPT の目的は、元のエピソードと変換されたエピソードの間の局所的な空間関係の一貫性を最大化することとして定義されます。この定義により、ESPT で強化された FSL 目標は、異なる画像の局所的な空間的特徴と各入力エピソードの相互関係構造情報をキャプチャする、より転送可能な特徴表現の学習を促進します。いくつかのサンプル。広範な実験により、私たちの ESPT メソッドが 3 つの主要なベンチマークデータセットで少数ショット画像分類の新しい最先端のパフォーマンスを達成することが示されています。ソースコードは https://github.com/Whut-YiRong/ESPT で入手できます。

Self-supervised learning (SSL) techniques have recently been integrated into the few-shot learning (FSL) framework and have shown promising results in improving the few-shot image classification performance. However, existing SSL approaches used in FSL typically seek the supervision signals from the global embedding of every single image. Therefore, during the episodic training of FSL, these methods cannot capture and fully utilize the local visual information in image samples and the data structure information of the whole episode, which are beneficial to FSL. To this end, we propose to augment the few-shot learning objective with a novel self-supervised Episodic Spatial Pretext Task (ESPT). Specifically, for each few-shot episode, we generate its corresponding transformed episode by applying a random geometric transformation to all the images in it. Based on these, our ESPT objective is defined as maximizing the local spatial relationship consistency between the original episode and the transformed one. With this definition, the ESPT-augmented FSL objective promotes learning more transferable feature representations that capture the local spatial features of different images and their inter-relational structural information in each input episode, thus enabling the model to generalize better to new categories with only a few samples. Extensive experiments indicate that our ESPT method achieves new state-of-the-art performance for few-shot image classification on three mainstay benchmark datasets. The source code will be available at: https://github.com/Whut-YiRong/ESPT.

updated: Wed Apr 26 2023 04:52:08 GMT+0000 (UTC)

published: Wed Apr 26 2023 04:52:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト