BB-GCN: A Bi-modal Bridged Graph Convolutional Network for Multi-label Chest X-Ray Recognition

Guoli Wang; Pingping Wang; Jinyu Cong; Kunmeng Liu; Benzheng Wei

BB-GCN: マルチラベル胸部 X 線認識のためのバイモーダルブリッジグラフ畳み込みネットワーク

マルチラベル胸部 X 線 (CXR) 認識には、さまざまな病状の複数のラベルを同時に診断および識別することが含まれます。病理学的ラベルは相互の関係に関する豊富な情報を持っているため、病理学的ラベル間の共起依存関係をモデル化することは、認識パフォーマンスを向上させるために不可欠です。ただし、以前の方法は、ローカルラベル情報をモデル化するための状態変数コーディングとアテンションメカニズムに依存しており、ラベル間のグローバル共起関係の学習が欠けています。さらに、これらの方法は、画像の特徴とラベルの埋め込みを大まかに統合し、クロスモーダルベクトル融合における配置とコンパクトさの問題を無視します。これらの問題を解決するために、バイモーダルブリッジグラフ畳み込みネットワーク (BB-GCN) モデルが提案されています。このモデルは、主にバックボーンモジュール、病理ラベル共起関係埋め込み (LCE) モジュール、およびトランスフォーマーブリッジグラフ (TBG) モジュールで構成されます。具体的には、バックボーンモジュールは画像の視覚的特徴表現を取得します。 LCE モジュールは、グラフを利用して複数のラベル間のグローバル共起関係をモデル化し、推論を学習するためにグラフ畳み込みネットワークを採用しています。 TBG モジュールは、GroupSum メソッドを介してクロスモーダルベクトルをよりコンパクトかつ効率的にブリッジします。2 つの大規模な CXR データセット (ChestX-Ray14 および CheXpert) で提案された BB-GCN の有効性を評価しました。私たちのモデルは最先端のパフォーマンスを達成しました.14の病状の平均AUCスコアは、それぞれ0.835と0.813でした.提案されたLCEとTBGモジュールは、BB-GCNの認識パフォーマンスを共同で効果的に改善することができます.私たちのモデルは、マルチラベル胸部 X 線認識でも満足のいく結果を達成し、非常に競争力のある一般化パフォーマンスを示します。

Multi-label chest X-ray (CXR) recognition involves simultaneously diagnosing and identifying multiple labels for different pathologies. Since pathological labels have rich information about their relationship to each other, modeling the co-occurrence dependencies between pathological labels is essential to improve recognition performance. However, previous methods rely on state variable coding and attention mechanisms-oriented to model local label information, and lack learning of global co-occurrence relationships between labels. Furthermore, these methods roughly integrate image features and label embedding, ignoring the alignment and compactness problems in cross-modal vector fusion.To solve these problems, a Bi-modal Bridged Graph Convolutional Network (BB-GCN) model is proposed. This model mainly consists of a backbone module, a pathology Label Co-occurrence relationship Embedding (LCE) module, and a Transformer Bridge Graph (TBG) module. Specifically, the backbone module obtains image visual feature representation. The LCE module utilizes a graph to model the global co-occurrence relationship between multiple labels and employs graph convolutional networks for learning inference. The TBG module bridges the cross-modal vectors more compactly and efficiently through the GroupSum method.We have evaluated the effectiveness of the proposed BB-GCN in two large-scale CXR datasets (ChestX-Ray14 and CheXpert). Our model achieved state-of-the-art performance: the mean AUC scores for the 14 pathologies were 0.835 and 0.813, respectively.The proposed LCE and TBG modules can jointly effectively improve the recognition performance of BB-GCN. Our model also achieves satisfactory results in multi-label chest X-ray recognition and exhibits highly competitive generalization performance.

updated: Wed Feb 22 2023 01:03:53 GMT+0000 (UTC)

published: Wed Feb 22 2023 01:03:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト