NAR-Former V2: Rethinking Transformer for Universal Neural Network Representation Learning

Yun Yi; Haokui Zhang; Rong Xiao; Nannan Wang; Xiaoyu Wang

NAR-旧 V2: ユニバーサルニューラルネットワーク表現学習のためのトランスフォーマーの再考

より多くの深層学習モデルが現実世界のアプリケーションに適用されるにつれて、ニューラルネットワーク自体の表現をモデリングして学習する必要性が高まっています。効率的な表現を使用すると、実際のトレーニングや展開手順を必要とせずにネットワークのターゲット属性を予測できるため、効率的なネットワーク展開と設計が容易になります。最近、Transformer の成功に触発されて、Transformer ベースの表現学習フレームワークがいくつか提案され、セル構造モデルの処理において有望なパフォーマンスを達成しました。ただし、グラフニューラルネットワーク (GNN) ベースのアプローチは、依然としてネットワーク全体の学習表現の分野を支配しています。このペーパーでは、Transformer を再検討し、GNN と比較して、その異なるアーキテクチャ特性を分析します。次に、修正された Transformer ベースのユニバーサルニューラルネットワーク表現学習モデル NAR-Former V2 を提案します。セル構造のネットワークとネットワーク全体の両方から効率的な表現を学習できます。具体的には、まずネットワークをグラフとして取得し、ネットワークをシーケンスにエンコードするための簡単なトークナイザーを設計します。次に、GNN の帰納的表現学習機能を Transformer に組み込み、未知のアーキテクチャに遭遇したときに Transformer がより適切に一般化できるようにします。さらに、グラフ構造から表現を学習する際の Transformer の能力を強化するために、一連のシンプルかつ効果的な変更を導入します。私たちが提案した手法は、NNLQP データセットでのレイテンシ推定において、GNN ベースの手法 NNLP を大幅に上回りました。さらに、NASBench101 および NASBench201 データセットの精度予測に関して、私たちの手法は他の最先端の手法と非常に匹敵するパフォーマンスを達成しています。

As more deep learning models are being applied in real-world applications, there is a growing need for modeling and learning the representations of neural networks themselves. An efficient representation can be used to predict target attributes of networks without the need for actual training and deployment procedures, facilitating efficient network deployment and design. Recently, inspired by the success of Transformer, some Transformer-based representation learning frameworks have been proposed and achieved promising performance in handling cell-structured models. However, graph neural network (GNN) based approaches still dominate the field of learning representation for the entire network. In this paper, we revisit Transformer and compare it with GNN to analyse their different architecture characteristics. We then propose a modified Transformer-based universal neural network representation learning model NAR-Former V2. It can learn efficient representations from both cell-structured networks and entire networks. Specifically, we first take the network as a graph and design a straightforward tokenizer to encode the network into a sequence. Then, we incorporate the inductive representation learning capability of GNN into Transformer, enabling Transformer to generalize better when encountering unseen architecture. Additionally, we introduce a series of simple yet effective modifications to enhance the ability of the Transformer in learning representation from graph structures. Our proposed method surpasses the GNN-based method NNLP by a significant margin in latency estimation on the NNLQP dataset. Furthermore, regarding accuracy prediction on the NASBench101 and NASBench201 datasets, our method achieves highly comparable performance to other state-of-the-art methods.

updated: Mon Jun 19 2023 09:11:04 GMT+0000 (UTC)

published: Mon Jun 19 2023 09:11:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト