SeiT: Storage-Efficient Vision Training with Tokens Using 1% of Pixel Storage

Song Park; Sanghyuk Chun; Byeongho Heo; Wonjae Kim; Sangdoo Yun

SeiT: ピクセルストレージの 1% を使用するトークンによるストレージ効率の高いビジョントレーニング

より一般化可能で画期的なビジョンモデルを実現するには、10 億規模の画像と、画像を出荷するための大規模なデータセットストレージが必要です (たとえば、LAION-4B データセットには 240 TB のストレージスペースが必要です)。ただし、限られたストレージインフラストラクチャで無制限のデータセットストレージを処理することは困難になっています。この問題に対処するために、多くのストレージ効率の高いトレーニング方法が提案されていますが、スケーラブルであったり、パフォーマンスに深刻なダメージを与えたりすることはめったにありません。このホワイトペーパーでは、生のレベルピクセルを使用せずに、インスタンスごとに 1024 トークンのみを使用する、大規模なデータセット (ImageNet など) のビジョン分類器のストレージ効率の高いトレーニング戦略を提案します。私たちのトークンストレージは、元の JPEG 圧縮生ピクセルの 1% 未満しか必要としません。また、トークンの拡張とステムアダプターモジュールを提案して、ステムレイヤーと慎重に調整された最適化設定を最小限に変更するだけで、ピクセルベースのアプローチと同じアーキテクチャを使用できるようにします。 ImageNet-1k での実験結果は、私たちの方法が他のストレージ効率の高いトレーニング方法よりも大幅に優れていることを示していますが、大きなギャップがあります。さらに、他の実用的なシナリオ、ストレージ効率の高い事前トレーニング、および継続的な学習で、この方法の有効性を示します。コードは https://github.com/naver-ai/seit で入手できます

We need billion-scale images to achieve more generalizable and ground-breaking vision models, as well as massive dataset storage to ship the images (e.g., the LAION-4B dataset needs 240TB storage space). However, it has become challenging to deal with unlimited dataset storage with limited storage infrastructure. A number of storage-efficient training methods have been proposed to tackle the problem, but they are rarely scalable or suffer from severe damage to performance. In this paper, we propose a storage-efficient training strategy for vision classifiers for large-scale datasets (e.g., ImageNet) that only uses 1024 tokens per instance without using the raw level pixels; our token storage only needs <1% of the original JPEG-compressed raw pixels. We also propose token augmentations and a Stem-adaptor module to make our approach able to use the same architecture as pixel-based approaches with only minimal modifications on the stem layer and the carefully tuned optimization settings. Our experimental results on ImageNet-1k show that our method significantly outperforms other storage-efficient training methods with a large gap. We further show the effectiveness of our method in other practical scenarios, storage-efficient pre-training, and continual learning. Code is available at https://github.com/naver-ai/seit

updated: Mon Sep 11 2023 06:04:24 GMT+0000 (UTC)

published: Mon Mar 20 2023 13:55:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト