ViPTT-Net: Video pretraining of spatio-temporal model for tuberculosis type classification from chest CT scans

Hasib Zunair; Aimon Rahman; Nabeel Mohammed

ViPTT-Net：胸部CTスキャンからの結核タイプ分類のための時空間モデルのビデオ事前トレーニング

事前トレーニングは、限られたデータから学習し、一般化を改善するためのディープラーニングワークフローへの関心の高まりを引き起こしました。これは2D画像分類タスクでは一般的ですが、胸部CT解釈などの3D医用画像タスクへの適用は限られています。胸部CTスキャンからの結核タイプの分類を目的として、モデルを最初からトレーニングするのではなく、現実的なビデオでモデルを事前トレーニングすることでパフォーマンスが向上するかどうかを検討します。空間的特徴と時間的特徴の両方を組み込むために、ハイブリッド畳み込みニューラルネットワーク（CNN）とリカレントニューラルネットワーク（RNN）モデルを開発します。このモデルでは、特徴がCNNによってCTスキャンの各軸スライスから抽出されます。これらの一連の画像特徴は次のとおりです。 CTスキャンの分類のためのRNNへの入力。 ViPTT-Netと呼ばれる私たちのモデルは、人間の活動のラベルが付いた1300以上のビデオクリップでトレーニングされ、結核タイプのラベルが付いた胸部CTスキャンで微調整されました。ビデオでモデルを事前トレーニングすると、特に過小評価されているクラスサンプルの場合、表現が向上し、カッパスコアが0.17から0.35までモデル検証のパフォーマンスが大幅に向上することがわかりました。私たちの最良の方法は、ImageCLEF 2021結核で2位を達成しました-画像情報のみを使用した（臨床メタデータを使用しない）最終テストセットでカッパスコア0.20のTBT分類タスク。すべてのコードとモデルが利用可能になります。

Pretraining has sparked groundswell of interest in deep learning workflows to learn from limited data and improve generalization. While this is common for 2D image classification tasks, its application to 3D medical imaging tasks like chest CT interpretation is limited. We explore the idea of whether pretraining a model on realistic videos could improve performance rather than training the model from scratch, intended for tuberculosis type classification from chest CT scans. To incorporate both spatial and temporal features, we develop a hybrid convolutional neural network (CNN) and recurrent neural network (RNN) model, where the features are extracted from each axial slice of the CT scan by a CNN, these sequence of image features are input to a RNN for classification of the CT scan. Our model termed as ViPTT-Net, was trained on over 1300 video clips with labels of human activities, and then fine-tuned on chest CT scans with labels of tuberculosis type. We find that pretraining the model on videos lead to better representations and significantly improved model validation performance from a kappa score of 0.17 to 0.35, especially for under-represented class samples. Our best method achieved 2nd place in the ImageCLEF 2021 Tuberculosis - TBT classification task with a kappa score of 0.20 on the final test set with only image information (without using clinical meta-data). All codes and models are made available.

updated: Wed May 26 2021 20:00:31 GMT+0000 (UTC)

published: Wed May 26 2021 20:00:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト