Model Composition: Can Multiple Neural Networks Be Combined into a Single Network Using Only Unlabeled Data?

Amin Banitalebi-Dehkordi; Xinyu Kang; Yong Zhang

モデル構成：ラベルのないデータのみを使用して、複数のニューラルネットワークを単一のネットワークに結合できますか？

ディープラーニングアプリケーション、データセット、ニューラルネットワークアーキテクチャの多様性により、ターゲットアプリケーションに最適なアーキテクチャとデータを慎重に選択する必要があります。このジレンマを軽減する試みとして、このペーパーでは、ラベルのないデータを使用して、複数のトレーニング済みニューラルネットワークを組み合わせるというアイデアを調査します。さらに、複数のモデルを1つに組み合わせると、推論が高速化され、より強力で高性能なモデルが得られ、デバイスに適した効率的なターゲットネットワークアーキテクチャを選択できます。この目的のために、提案された方法は、ラベル付けされていないデータから収集された信頼できる疑似ラベルの生成、フィルタリング、および集約を利用する。私たちの方法は、任意のアーキテクチャとカテゴリを持つ任意の数の入力モデルの使用をサポートします。広範なパフォーマンス評価により、私たちの方法が非常に効果的であることが実証されました。たとえば、オブジェクト検出のタスクで、グラウンドトゥルースラベルを使用しない場合、Pascal-VOCでトレーニングされたEfficientDet-D0とCOCOでトレーニングされたEfficientDet-D1を、同様のmAPを使用してRetinaNet-ResNet50モデルに組み合わせることができます。教師ありトレーニングとして。半教師あり設定で微調整した場合、組み合わせたモデルは、ラベルの1％、5％、および10％を使用した教師ありトレーニングと比較して、+ 18.6％、+ 12.6％、および+ 8.1％のmAPの改善を達成します。

The diversity of deep learning applications, datasets, and neural network architectures necessitates a careful selection of the architecture and data that match best to a target application. As an attempt to mitigate this dilemma, this paper investigates the idea of combining multiple trained neural networks using unlabeled data. In addition, combining multiple models into one can speed up the inference, result in stronger, more capable models, and allows us to select efficient device-friendly target network architectures. To this end, the proposed method makes use of generation, filtering, and aggregation of reliable pseudo-labels collected from unlabeled data. Our method supports using an arbitrary number of input models with arbitrary architectures and categories. Extensive performance evaluations demonstrated that our method is very effective. For example, for the task of object detection and without using any ground-truth labels, an EfficientDet-D0 trained on Pascal-VOC and an EfficientDet-D1 trained on COCO, can be combined to a RetinaNet-ResNet50 model, with a similar mAP as the supervised training. If fine-tuned in a semi-supervised setting, the combined model achieves +18.6%, +12.6%, and +8.1% mAP improvements over supervised training with 1%, 5%, and 10% of labels.

updated: Wed Oct 20 2021 04:17:25 GMT+0000 (UTC)

published: Wed Oct 20 2021 04:17:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト