Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces

Simon Dahan; Hao Xu; Logan Z. J. Williams; Abdulah Fawaz; Chunhui Yang; Timothy S. Coalson; Michelle C. Williams; David E. Newby; A. David Edwards; Matthew F. Glasser; Alistair A. Young; Daniel Rueckert; Emma C. Robinson

サーフェスビジョントランスフォーマー：生物医学的表面の柔軟な注意ベースのモデリング

コンピュータービジョンタスクにおけるビジョントランスフォーマー（ViT）の最近の最先端のパフォーマンスは、長距離自己注意を実装する汎用アーキテクチャが、畳み込みニューラルネットワークの局所的な特徴学習操作に取って代わる可能性があることを示しています。この論文では、一般的な表面メッシュのパッチメカニズムを提案することにより、シーケンス間学習問題として表面学習のタスクを再定式化することにより、ViTを表面に拡張します。パッチのシーケンスは、トランスフォーマーエンコーダーによって処理され、分類または回帰に使用されます。さまざまな生物医学的表面ドメインとタスクでメソッドを検証します。開発中のヒューマンコネクトームプロジェクト（dHCP）での脳年齢予測、ヒューマンコネクトームプロジェクト（HCP）での流体インテリジェンス予測、および心臓のスコットランド計算トモグラフィー（SCOT-HEART）データセット、およびモデルのパフォーマンスに対する事前トレーニングとデータ拡張の影響を調査します。結果は、Surface Vision Transformers（SiT）が、脳年齢と流動性知能の予測のための幾何学的深層学習法に対して一貫した改善を示し、臨床診療で使用される標準的な測定基準と同等のカルシウムスコア分類のパフォーマンスを達成することを示唆しています。さらに、変圧器の注意マップの分析により、各タスクを推進する機能の明確で個別の予測が提供されます。コードはGithubで入手できます：https：//github.com/metrics-lab/surface-vision-transformers

Recent state-of-the-art performances of Vision Transformers (ViT) in computer vision tasks demonstrate that a general-purpose architecture, which implements long-range self-attention, could replace the local feature learning operations of convolutional neural networks. In this paper, we extend ViTs to surfaces by reformulating the task of surface learning as a sequence-to-sequence learning problem, by proposing patching mechanisms for general surface meshes. Sequences of patches are then processed by a transformer encoder and used for classification or regression. We validate our method on a range of different biomedical surface domains and tasks: brain age prediction in the developing Human Connectome Project (dHCP), fluid intelligence prediction in the Human Connectome Project (HCP), and coronary artery calcium score classification using surfaces from the Scottish Computed Tomography of the Heart (SCOT-HEART) dataset, and investigate the impact of pretraining and data augmentation on model performance. Results suggest that Surface Vision Transformers (SiT) demonstrate consistent improvement over geometric deep learning methods for brain age and fluid intelligence prediction and achieve comparable performance on calcium score classification to standard metrics used in clinical practice. Furthermore, analysis of transformer attention maps offers clear and individualised predictions of the features driving each task. Code is available on Github: https://github.com/metrics-lab/surface-vision-transformers

updated: Thu Apr 07 2022 12:45:54 GMT+0000 (UTC)

published: Thu Apr 07 2022 12:45:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト