Effective Sample Pair Generation for Ultrasound Video Contrastive Representation Learning

Yixiong Chen; Chunhui Zhang; Li Liu; Cheng Feng; Changfeng Dong; Yongfang Luo; Xiang Wan

超音波ビデオ対照表現学習のための効果的なサンプルペア生成

ほとんどのディープニューラルネットワーク（DNN）ベースの超音波（US）医療画像分析モデルは、モデルの一般化を改善するために、事前にトレーニングされたバックボーン（ImageNetなど）を使用します。ただし、自然画像と医用画像の間のドメインギャップは、米国の画像分析に適用する場合、避けられないパフォーマンスのボトルネックを引き起こします。私たちのアイデアは、このボトルネックを回避するために、米国の画像でDNNを直接事前トレーニングすることです。米国の画像の注釈付きの大規模なデータセットがないため、最初にUS-4という名前の新しい大規模な米国のビデオベースの画像データセットを構築します。サブデータセットは、地元の経験豊富な医師によって新たに収集されます。このデータセットを最大限に活用するために、米国の画像が抽出した問題に取り組むための新しいサンプルペア生成（SPG）スキームを使用して、米国の画像の特徴表現を効果的に学習する米国の半教師あり対照学習（USCL）手法を革新的に提案します。ビデオからの類似性は高いです。さらに、USCLは、対照的な損失を一貫した正則化として扱います。これにより、教師あり損失を相互に強化する方法で組み合わせることにより、事前トレーニングされたバックボーンのパフォーマンスが向上します。ダウンストリームタスクの微調整に関する広範な実験は、ImageNetの事前トレーニングおよび以前の最先端の半教師あり学習アプローチを使用した事前トレーニングに対するアプローチの優位性を示しています。特に、事前トレーニングされたバックボーンは、94％を超える微調整精度を取得します。これは、広く使用されているPOCUSデータセットのImageNet事前トレーニングモデルの85％よりも9％高くなっています。この作業の構築されたUS-4データセットとソースコードは公開されます。

Most deep neural networks (DNNs) based ultrasound (US) medical image analysis models use pretrained backbones (e.g., ImageNet) for better model generalization. However, the domain gap between natural and medical images causes an inevitable performance bottleneck when applying to US image analysis. Our idea is to pretrain DNNs on US images directly to avoid this bottleneck. Due to the lack of annotated large-scale datasets of US images, we first construct a new large-scale US video-based image dataset named US-4, containing over 23,000 high-resolution images from four US video sub-datasets, where two sub-datasets are newly collected by our local experienced doctors. To make full use of this dataset, we then innovatively propose an US semi-supervised contrastive learning (USCL) method to effectively learn feature representations of US images, with a new sample pair generation (SPG) scheme to tackle the problem that US images extracted from videos have high similarities. Moreover, the USCL treats contrastive loss as a consistent regularization, which boosts the performance of pretrained backbones by combining the supervised loss in a mutually reinforcing way. Extensive experiments on down-stream tasks' fine-tuning show the superiority of our approach against ImageNet pretraining and pretraining using previous state-of-the-art semi-supervised learning approaches. In particular, our pretrained backbone gets fine-tuning accuracy of over 94%, which is 9% higher than 85% of the ImageNet pretrained model on the widely used POCUS dataset. The constructed US-4 dataset and source codes of this work will be made public.

updated: Wed Nov 25 2020 23:44:38 GMT+0000 (UTC)

published: Wed Nov 25 2020 23:44:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト