Spatiotemporal Feature Learning Based on Two-Step LSTM and Transformer for CT Scans

Chih-Chung Hsu; Chi-Han Tsai; Guan-Lin Chen; Sin-Di Ma; Shen-Chieh Tai

CTスキャン用の2ステップLSTMとトランスフォーマーに基づく時空間特徴学習

コンピュータ断層撮影（CT）イメージングは、さまざまな病気の診断に非常に実用的である可能性があります。ただし、CTスキャンの解像度とスライス数はマシンとその設定によって決定されるため、CT画像の性質はさらに多様です。ディープニューラルネットワークの本質的な要件は入力データの一貫した形状であるため、従来のディープラーニングモデルはそのような多様なデータをくすぐるのが困難です。この論文では、COVID-19症状分類のためにこの問題を徹底的にくすぐるための、斬新で効果的な2段階のアプローチを提案します。まず、CTスキャンの各スライスの意味的特徴の埋め込みが従来のバックボーンネットワークによって抽出されます。次に、時間的特徴学習を処理するための長短期記憶（LSTM）とTransformerベースのサブネットワークを提案し、時空間的特徴表現学習に導きました。このようにして、提案された2ステップのLSTMモデルは、過剰適合を防ぎ、パフォーマンスを向上させることができます。包括的な実験により、提案された2段階の方法は、優れた性能を示すだけでなく、相互に補償できることが明らかになりました。より具体的には、2ステップのLSTMモデルは偽陰性率が低く、2ステップのSwinモデルは偽陽性率が低くなります。要約すると、モデルアンサンブルを採用して、実際のアプリケーションでより安定した有望なパフォーマンスを実現できることが示唆されています。

Computed tomography (CT) imaging could be very practical for diagnosing various diseases. However, the nature of the CT images is even more diverse since the resolution and number of the slices of a CT scan are determined by the machine and its settings. Conventional deep learning models are hard to tickle such diverse data since the essential requirement of the deep neural network is the consistent shape of the input data. In this paper, we propose a novel, effective, two-step-wise approach to tickle this issue for COVID-19 symptom classification thoroughly. First, the semantic feature embedding of each slice for a CT scan is extracted by conventional backbone networks. Then, we proposed a long short-term memory (LSTM) and Transformer-based sub-network to deal with temporal feature learning, leading to spatiotemporal feature representation learning. In this fashion, the proposed two-step LSTM model could prevent overfitting, as well as increase performance. Comprehensive experiments reveal that the proposed two-step method not only shows excellent performance but also could be compensated for each other. More specifically, the two-step LSTM model has a lower false-negative rate, while the 2-step Swin model has a lower false-positive rate. In summary, it is suggested that the model ensemble could be adopted for more stable and promising performance in real-world applications.

updated: Fri Jul 08 2022 07:49:36 GMT+0000 (UTC)

published: Mon Jul 04 2022 16:59:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト