Free Lunch for Surgical Video Understanding by Distilling Self-Supervisions

Xinpeng Ding; Ziwei Liu; Xiaomeng Li

自己監視を蒸留することによる外科ビデオ理解のための無料ランチ

自己管理学習は、視覚とNLPの大きな進歩を目の当たりにしてきました。最近では、X線、CT、MRIなどのさまざまな医用画像診断法にも大きな注目が集まっています。既存の方法は主に、医療画像の特性に応じた再構成、方向付け、マスキング識別などの新しい口実自己監視タスクの構築に焦点を合わせています。ただし、公開されている自己監視モデルは十分に活用されていません。この論文では、手術ビデオを理解するための強力でありながら効率的な自己監視フレームワークを紹介します。私たちの重要な洞察は、大規模な汎用データセット4でトレーニングされた公開モデルから知識を抽出して、手術ビデオの自己監視学習を促進することです。この目的のために、まず、公開されているモデルからのセマンティクスを含むだけでなく、手術データの正確な知識を生成できる教師モデルを取得するためのセマンティクス保存トレーニングスキームを導入します。対照的な学習のみを使用したトレーニングに加えて、豊富な学習情報を教師モデルから手術データの自己管理学習に転送するための蒸留目標も導入します。 2つの手術段階認識ベンチマークに関する広範な実験は、私たちのフレームワークが既存の自己監視学習法のパフォーマンスを大幅に改善できることを示しています。特に、私たちのフレームワークは、低データ体制の下で説得力のある利点を示しています。私たちのコードはhttps://github.com/xmed-lab/DistillingSelfで入手できます。

Self-supervised learning has witnessed great progress in vision and NLP; recently, it also attracted much attention to various medical imaging modalities such as X-ray, CT, and MRI. Existing methods mostly focus on building new pretext self-supervision tasks such as reconstruction, orientation, and masking identification according to the properties of medical images. However, the publicly available self-supervision models are not fully exploited. In this paper, we present a powerful yet efficient self-supervision framework for surgical video understanding. Our key insight is to distill knowledge from publicly available models trained on large generic datasets4 to facilitate the self-supervised learning of surgical videos. To this end, we first introduce a semantic-preserving training scheme to obtain our teacher model, which not only contains semantics from the publicly available models, but also can produce accurate knowledge for surgical data. Besides training with only contrastive learning, we also introduce a distillation objective to transfer the rich learned information from the teacher model to self-supervised learning on surgical data. Extensive experiments on two surgical phase recognition benchmarks show that our framework can significantly improve the performance of existing self-supervised learning methods. Notably, our framework demonstrates a compelling advantage under a low-data regime. Our code is available at https://github.com/xmed-lab/DistillingSelf.

updated: Thu May 19 2022 02:46:44 GMT+0000 (UTC)

published: Thu May 19 2022 02:46:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト