SepTr: Separable Transformer for Audio Spectrogram Processing

Nicolae-Catalin Ristea; Radu Tudor Ionescu; Fahad Shahbaz Khan

SepTr：オーディオスペクトログラム処理用の分離可能なトランス

複数のコンピュータービジョンタスクでのビジョントランスフォーマーの適用が成功した後、これらのモデルは信号処理コミュニティの注目を集めました。これは、信号がスペクトログラムとして表されることが多く（たとえば、離散フーリエ変換を介して）、ビジョントランスへの入力として直接提供できるためです。ただし、トランスフォーマーをスペクトログラムに単純に適用することは最適ではありません。軸は異なる次元、つまり周波数と時間を表すため、各軸に専念する注意を分離することがより良いアプローチであると主張します。この目的のために、2つの変圧器ブロックを順番に使用するアーキテクチャであるSeparable Transformer（SepTr）を提案します。最初は同じ周波数ビン内のトークンに対応し、2番目は同じ時間間隔内のトークンに対応します。 3つのベンチマークデータセットで実験を行い、分離可能なアーキテクチャが従来のビジョントランスやその他の最先端の方法よりも優れていることを示しています。標準のトランスフォーマーとは異なり、SepTrは入力サイズに応じてトレーニング可能なパラメーターの数を線形にスケーリングするため、メモリーフットプリントが低くなります。私たちのコードは、https：//github.com/ristea/septrでオープンソースとして入手できます。

Following the successful application of vision transformers in multiple computer vision tasks, these models have drawn the attention of the signal processing community. This is because signals are often represented as spectrograms (e.g. through Discrete Fourier Transform) which can be directly provided as input to vision transformers. However, naively applying transformers to spectrograms is suboptimal. Since the axes represent distinct dimensions, i.e. frequency and time, we argue that a better approach is to separate the attention dedicated to each axis. To this end, we propose the Separable Transformer (SepTr), an architecture that employs two transformer blocks in a sequential manner, the first attending to tokens within the same frequency bin, and the second attending to tokens within the same time interval. We conduct experiments on three benchmark data sets, showing that our separable architecture outperforms conventional vision transformers and other state-of-the-art methods. Unlike standard transformers, SepTr linearly scales the number of trainable parameters with the input size, thus having a lower memory footprint. Our code is available as open source at https://github.com/ristea/septr.

updated: Wed May 18 2022 18:59:47 GMT+0000 (UTC)

published: Thu Mar 17 2022 19:48:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト