LowDINO -- A Low Parameter Self Supervised Learning Model

Sai Krishna Prathapaneni; Shvejan Shashank; Srikar Reddy K

LowDINO -- 低パラメータの自己教師あり学習モデル

この研究の目的は、自己教師あり学習 (SSL) で成功を収めている巨大ネットワークの特性を、画像分類、セグメンテーション、これまでの研究では、畳み込みニューラルネットワーク (ConvNet) を使用すると、深層学習モデルの学習表現にとって重要な固有の帰納的バイアスが提供される可能性があることが示されています。パラメーターの数を減らすために、MobileViT ブロックを使用してアテンションメカニズムが利用され、パラメーターの数が 500 万未満のモデルが得られます。モデルは運動量エンコーダを備えた自己蒸留を使用してトレーニングされ、学生と教師のアーキテクチャも採用されており、教師の重み付けには最近の SOTA SSL モデルのビジョントランスフォーマー (ViT) が使用されます。モデルは ImageNet1k データセットでトレーニングされます。この研究は、重いモデルと同等の SSL タスクを実行できる、より小型で効率的なニューラルネットワークアーキテクチャを設計するためのアプローチを提供します。

This research aims to explore the possibility of designing a neural network architecture that allows for small networks to adopt the properties of huge networks, which have shown success in self-supervised learning (SSL), for all the downstream tasks like image classification, segmentation, etc. Previous studies have shown that using convolutional neural networks (ConvNets) can provide inherent inductive bias, which is crucial for learning representations in deep learning models. To reduce the number of parameters, attention mechanisms are utilized through the usage of MobileViT blocks, resulting in a model with less than 5 million parameters. The model is trained using self-distillation with momentum encoder and a student-teacher architecture is also employed, where the teacher weights use vision transformers (ViTs) from recent SOTA SSL models. The model is trained on the ImageNet1k dataset. This research provides an approach for designing smaller, more efficient neural network architectures that can perform SSL tasks comparable to heavy models

updated: Sun May 28 2023 18:34:59 GMT+0000 (UTC)

published: Sun May 28 2023 18:34:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト