TORE: Token Recycling in Vision Transformers for Efficient Active Visual Exploration

Jan Olszewski; Dawid Rymarczyk; Piotr Wójcik; Mateusz Pach; Bartosz Zieliński

Active Visual Exploration (AVE) optimizes the utilization of robotic resources in real-world scenarios by sequentially selecting the most informative observations. However, modern methods require a high computational budget due to processing the same observations multiple times through the autoencoder transformers. As a remedy, we introduce a novel approach to AVE called TOken REcycling (TORE). It divides the encoder into extractor and aggregator components. The extractor processes each observation separately, enabling the reuse of tokens passed to the aggregator. Moreover, to further reduce the computations, we decrease the decoder to only one block. Through extensive experiments, we demonstrate that TORE outperforms state-of-the-art methods while reducing computational overhead by up to 90%.

updated: Mon Nov 25 2024 07:08:38 GMT+0000 (UTC)

published: Sun Nov 26 2023 15:39:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト