Nested Hierarchical Transformer: Towards Accurate, Data-Efficient and Interpretable Visual Understanding

Zizhao Zhang; Han Zhang; Long Zhao; Ting Chen; Sercan O. Arik; Tomas Pfister

ネストされた階層型トランスフォーマー：正確でデータ効率が高く、解釈可能な視覚的理解に向けて

階層構造は最近のビジョントランスフォーマーで人気がありますが、うまく機能するには高度な設計と大規模なデータセットが必要です。このホワイトペーパーでは、重複しないイメージブロックに基本的なローカルトランスフォーマーをネストし、それらを階層的に集約するというアイデアを検討します。ブロック集約機能は、ブロック間の非ローカル情報通信を可能にする上で重要な役割を果たしていることがわかります。この観察により、元のビジョントランスフォーマーのコードをわずかに変更する必要がある単純化されたアーキテクチャを設計することができます。提案された慎重に選択された設計の利点は3つあります。（1）NesTはより速く収束し、ImageNetとCIFARのような小さなデータセットの両方で優れた一般化を達成するために必要なトレーニングデータがはるかに少なくなります。（2）主要なアイデアを画像生成に拡張すると、NesTは、以前のトランスベースのジェネレーターよりも8倍高速な強力なデコーダーにつながります。（3）設計でこのネストされた階層を介して特徴学習と抽象化のプロセスを分離することで、学習したモデルを視覚的に解釈するための新しいメソッド（GradCATという名前）を構築できることを示します。ソースコードはhttps://github.com/google-research/nested-transformerで入手できます。

Hierarchical structures are popular in recent vision transformers, however, they require sophisticated designs and massive datasets to work well. In this paper, we explore the idea of nesting basic local transformers on non-overlapping image blocks and aggregating them in a hierarchical way. We find that the block aggregation function plays a critical role in enabling cross-block non-local information communication. This observation leads us to design a simplified architecture that requires minor code changes upon the original vision transformer. The benefits of the proposed judiciously-selected design are threefold: (1) NesT converges faster and requires much less training data to achieve good generalization on both ImageNet and small datasets like CIFAR; (2) when extending our key ideas to image generation, NesT leads to a strong decoder that is 8× faster than previous transformer-based generators; and (3) we show that decoupling the feature learning and abstraction processes via this nested hierarchy in our design enables constructing a novel method (named GradCAT) for visually interpreting the learned model. Source code is available https://github.com/google-research/nested-transformer.

updated: Thu Dec 30 2021 17:37:57 GMT+0000 (UTC)

published: Wed May 26 2021 17:56:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト