Spirit Distillation: Precise Real-time Semantic Segmentation of Road Scenes with Insufficient Data

Zhiyuan Wu; Yu Jiang; Chupeng Cui; Zongmin Yang; Xinhui Xue; Hong Qi

精神蒸留：データが不十分な道路シーンの正確なリアルタイムセマンティックセグメンテーション

道路シーンのセマンティックセグメンテーションは、自動運転シーンの知覚を実現するための重要なテクノロジーの1つであり、このタスクに対する深い畳み込みニューラルネットワーク（CNN）の有効性が実証されています。セマンティックセグメンテーション用の最先端のCNNは、過剰な計算と大規模なトレーニングデータ要件に悩まされています。微調整ベースの転送学習（FTT）と機能ベースの知識蒸留のアイデアに触発されて、Spirit Distillation（SD）という名前のクロスドメイン知識転送と効率的なデータ不足のネットワークトレーニングのための新しい知識蒸留方法を提案します。これにより、学生ネットワークは教師ネットワークを模倣して一般的な特徴を抽出できるため、コンパクトで正確な学生ネットワークをトレーニングして、道路シーンのリアルタイムのセマンティックセグメンテーションを行うことができます。次に、不十分なデータの問題をさらに軽減し、学生のロバスト性を向上させるために、ターゲットとの両方からの画像を考慮することによって、より包括的な一般的な特徴抽出機能を活用することを約束する、Enhanced Spirit Distillation（ESD）メソッドが提案されます。入力としての近接ドメイン。私たちの知る限り、この論文は知識蒸留の数ショット学習への応用に関する先駆的な研究です。 COCO2017とKITTIから転送された事前知識を使用してCityscapesセグメンテーションセグメンテーションで実施された説得力のある実験は、私たちの方法がより良い学生ネットワークをトレーニングできることを示しています（mIOUと高精度の精度はそれぞれ1.4％と8.2％向上し、セグメンテーションの分散は78.2％） 41.8％のフロップ（図1を参照）。

Semantic segmentation of road scenes is one of the key technologies for realizing autonomous driving scene perception, and the effectiveness of deep Convolutional Neural Networks(CNNs) for this task has been demonstrated. State-of-art CNNs for semantic segmentation suffer from excessive computations as well as large-scale training data requirement. Inspired by the ideas of Fine-tuning-based Transfer Learning (FTT) and feature-based knowledge distillation, we propose a new knowledge distillation method for cross-domain knowledge transference and efficient data-insufficient network training, named Spirit Distillation(SD), which allow the student network to mimic the teacher network to extract general features, so that a compact and accurate student network can be trained for real-time semantic segmentation of road scenes. Then, in order to further alleviate the trouble of insufficient data and improve the robustness of the student, an Enhanced Spirit Distillation (ESD) method is proposed, which commits to exploit a more comprehensive general features extraction capability by considering images from both the target and the proximity domains as input. To our knowledge, this paper is a pioneering work on the application of knowledge distillation to few-shot learning. Persuasive experiments conducted on Cityscapes semantic segmentation with the prior knowledge transferred from COCO2017 and KITTI demonstrate that our methods can train a better student network (mIOU and high-precision accuracy boost by 1.4% and 8.2% respectively, with 78.2% segmentation variance) with only 41.8% FLOPs (see Fig. 1).

updated: Sat Apr 17 2021 00:40:53 GMT+0000 (UTC)

published: Thu Mar 25 2021 10:23:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト