FairGen: Towards Fair Graph Generation

Lecheng Zheng; Dawei Zhou; Hanghang Tong; Jiejun Xu; Yada Zhu; Jingrui He

FairGen: 公正なグラフ生成に向けて

過去数十年にわたり、ソーシャルネットワークからコンピューターネットワーク、遺伝子調節ネットワークからオンライントランザクションネットワークに至るまで、さまざまな領域で現実的なグラフを生成することに専念してきた途方もない努力がありました。目覚ましい成功にもかかわらず、これらの作業の大部分は本質的に教師なしであり、通常、予想されるグラフ再構築の損失を最小限に抑えるように訓練されています。目標に対してより少なく、したがって体系的により高いエラーに悩まされます。このホワイトペーパーでは、ラベル情報とユーザーが好むパリティ制約を活用して、ダウンストリームマイニングタスクに合わせてグラフ生成を調整することを目指しています。特に、グラフ生成モデルのコンテキストにおける表現の不均衡の調査から始めます。不均衡を軽減するために、FairGen という名前の公平性を考慮したグラフ生成モデルを提案します。私たちのモデルは、「簡単な」概念から「難しい」概念まで、保護されたグループと保護されていないグループの動作を段階的に学習することにより、ラベルに基づくグラフ生成モジュールと公正な表現学習モジュールを共同でトレーニングします。さらに、グラフ生成モデルの一般的なコンテキストサンプリング戦略を提案します。これは、各グループのコンテキスト情報を高い確率で公平にキャプチャできることが証明されています。 Web ベースのグラフを含む 7 つの実世界のデータセットに関する実験結果は、FairGen が (1) 6 つのネットワークプロパティ全体で最先端のグラフ生成モデルと同等のパフォーマンスを取得し、(2) 表現の不均衡の問題を緩和することを示しています。生成されたグラフでは、(3) データ拡張により、ダウンストリームタスクでモデルのパフォーマンスが最大 17% 大幅に向上します。

There have been tremendous efforts over the past decades dedicated to the generation of realistic graphs in a variety of domains, ranging from social networks to computer networks, from gene regulatory networks to online transaction networks. Despite the remarkable success, the vast majority of these works are unsupervised in nature and are typically trained to minimize the expected graph reconstruction loss, which would result in the representation disparity issue in the generated graphs, i.e., the protected groups (often minorities) contribute less to the objective and thus suffer from systematically higher errors. In this paper, we aim to tailor graph generation to downstream mining tasks by leveraging label information and user-preferred parity constraint. In particular, we start from the investigation of representation disparity in the context of graph generative models. To mitigate the disparity, we propose a fairness-aware graph generative model named FairGen. Our model jointly trains a label-informed graph generation module and a fair representation learning module by progressively learning the behaviors of the protected and unprotected groups, from the `easy' concepts to the `hard' ones. In addition, we propose a generic context sampling strategy for graph generative models, which is proven to be capable of fairly capturing the contextual information of each group with a high probability. Experimental results on seven real-world data sets, including web-based graphs, demonstrate that FairGen (1) obtains performance on par with state-of-the-art graph generative models across six network properties, (2) mitigates the representation disparity issues in the generated graphs, and (3) substantially boosts the model performance by up to 17% in downstream tasks via data augmentation.

updated: Thu Mar 30 2023 23:30:42 GMT+0000 (UTC)

published: Thu Mar 30 2023 23:30:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト