Orderly Dual-Teacher Knowledge Distillation for Lightweight Human Pose Estimation

Zhong-Qiu Zhao; Yao Gao; Yuchen Ge; Weidong Tian

軽量の人間の姿勢推定のための整然としたデュアルティーチャーの知識蒸留

ディープコンボリューションニューラルネットワーク（DCNN）は、人間の姿勢推定で優れたパフォーマンスを実現しましたが、これらのネットワークには多くの場合、多数のパラメーターと計算があり、推論速度が遅くなります。この問題の場合、効果的な解決策は知識の蒸留です。これは、事前にトレーニングされた大規模なネットワーク（教師）から小規模なネットワーク（学生）に知識を転送します。ただし、既存のアプローチにはいくつかの欠点があります。（I）生徒が複数の教師から学ぶことができる可能性を無視して、1人の教師のみが採用されます。（II）人間のセグメンテーションマスクは、キーポイントの位置を制限するための追加の事前情報と見なすことができますが、これは決して利用されません。（III）パラメータの数が少ない生徒は、データセットと教師によって提供されるヒートマップを完全に模倣することはできません。（IV）教師によって生成されたヒートマップにノイズが存在し、モデルの劣化を引き起こします。これらの欠陥を克服するために、異なる能力を持つ2人の教師で構成される整然とした二重教師知識蒸留（ODKD）フレームワークを提案します。具体的には、弱い方（主任教師、PT）を使用してキーポイント情報を教え、強い方（上級教師、ST）を使用して、人間のセグメンテーションマスクを追加することにより、セグメンテーションとキーポイント情報を転送します。デュアルティーチャーをまとめて、知識の吸収性を促進するための整然とした学習戦略が提案されています。さらに、生徒の学習能力をさらに向上させ、ヒートマップのノイズを低減する2値化操作を採用しています。 COCOおよびOCHumanキーポイントデータセットの実験結果は、提案されたODKDがさまざまな軽量モデルのパフォーマンスを大幅に改善できることを示しています。ODKDを搭載したHRNet-W16は、軽量の人間の姿勢推定のための最先端のパフォーマンスを実現します。

Although deep convolution neural networks (DCNN) have achieved excellent performance in human pose estimation, these networks often have a large number of parameters and computations, leading to the slow inference speed. For this issue, an effective solution is knowledge distillation, which transfers knowledge from a large pre-trained network (teacher) to a small network (student). However, there are some defects in the existing approaches: (I) Only a single teacher is adopted, neglecting the potential that a student can learn from multiple teachers. (II) The human segmentation mask can be regarded as additional prior information to restrict the location of keypoints, which is never utilized. (III) A student with a small number of parameters cannot fully imitate heatmaps provided by datasets and teachers. (IV) There exists noise in heatmaps generated by teachers, which causes model degradation. To overcome these defects, we propose an orderly dual-teacher knowledge distillation (ODKD) framework, which consists of two teachers with different capabilities. Specifically, the weaker one (primary teacher, PT) is used to teach keypoints information, the stronger one (senior teacher, ST) is utilized to transfer segmentation and keypoints information by adding the human segmentation mask. Taking dual-teacher together, an orderly learning strategy is proposed to promote knowledge absorbability. Moreover, we employ a binarization operation which further improves the learning ability of the student and reduces noise in heatmaps. Experimental results on COCO and OCHuman keypoints datasets show that our proposed ODKD can improve the performance of different lightweight models by a large margin, and HRNet-W16 equipped with ODKD achieves state-of-the-art performance for lightweight human pose estimation.

updated: Mon Jun 14 2021 15:28:36 GMT+0000 (UTC)

published: Wed Apr 21 2021 08:50:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト