AdaGCN:Adaptive Boosting Algorithm for Graph Convolutional Networks on Imbalanced Node Classification

S. Shi; Kai Qiao; Shuai Yang; L. Wang; J. Chen; Bin Yan

AdaGCN：不均衡なノード分類でのグラフ畳み込みネットワークのための適応ブースティングアルゴリズム

グラフニューラルネットワーク（GNN）は、グラフデータ表現で目覚ましい成功を収めています。ただし、以前の作業では理想的なバランスの取れたデータセットしか考慮されておらず、実際の不均衡なデータセットはほとんど考慮されていませんでした。これは、逆に、GNNのアプリケーションにとってより重要です。不均衡なデータセットを処理するリサンプリング、リウェイト、合成サンプルなどの従来の方法は、GNNでは適用できなくなりました。アンサンブルモデルは、単一の推定量と比較して、不均衡なデータセットをより適切に処理できます。さらに、アンサンブル学習は、単一の推定量と比較して、より高い推定精度と信頼性を実現できます。この論文では、適応ブースティング中の基本推定量としてグラフ畳み込みネットワーク（GCN）を使用するAdaGCNと呼ばれるアンサンブルモデルを提案します。 AdaGCNでは、以前の分類器によって適切に分類されなかったトレーニングサンプルに対してより高い重みが設定され、伝達学習が計算コストの削減とフィッティング能力の向上に使用されます。実験によると、提案したAdaGCNモデルは、GCN、GraphSAGE、GAT、N-GCN、および合成不均衡データセットでのほとんどの高度なリウェイトおよびリサンプリング方法よりも優れたパフォーマンスを達成し、平均で4.3％向上しています。私たちのモデルはまた、私たちが検討するすべての挑戦的なノード分類タスク（Cora、Citeseer、Pubmed、およびNELL）の最先端のベースラインを改善します。

The Graph Neural Network (GNN) has achieved remarkable success in graph data representation. However, the previous work only considered the ideal balanced dataset, and the practical imbalanced dataset was rarely considered, which, on the contrary, is of more significance for the application of GNN. Traditional methods such as resampling, reweighting and synthetic samples that deal with imbalanced datasets are no longer applicable in GNN. Ensemble models can handle imbalanced datasets better compared with single estimator. Besides, ensemble learning can achieve higher estimation accuracy and has better reliability compared with the single estimator. In this paper, we propose an ensemble model called AdaGCN, which uses a Graph Convolutional Network (GCN) as the base estimator during adaptive boosting. In AdaGCN, a higher weight will be set for the training samples that are not properly classified by the previous classifier, and transfer learning is used to reduce computational cost and increase fitting capability. Experiments show that the AdaGCN model we proposed achieves better performance than GCN, GraphSAGE, GAT, N-GCN and the most of advanced reweighting and resampling methods on synthetic imbalanced datasets, with an average improvement of 4.3%. Our model also improves state-of-the-art baselines on all of the challenging node classification tasks we consider: Cora, Citeseer, Pubmed, and NELL.

updated: Tue May 25 2021 02:43:31 GMT+0000 (UTC)

published: Tue May 25 2021 02:43:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト