LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

Chenyang Li; Zhi-Qi Cheng; Jun-Yan He; Pengyu Li; Bin Luo; Hanyuan Chen; Yifeng Geng; Jin-Peng Lan; Xuansong Xie

LongShortNet: 時間的特徴と意味的特徴の探索ストリーミング認識における融合

ストリーミング認識は、自動操縦システムの遅延と精度のバランスを取る必要がある自動運転の重要なタスクです。ただし、知覚をストリーミングするための現在の方法は、現在および隣接する 2 つのフレームのみに依存して動きのパターンを学習するため、制限があります。これにより、複雑なシーンをモデル化する機能が制限され、多くの場合、検出結果が低下します。この制限に対処するために、LongShortNet を提案します。これは、長期的な一時的な動きをキャプチャし、それをリアルタイムの知覚のために短期的な空間セマンティクスと統合する新しいデュアルパスネットワークです。 LongShortNet は、長期的な時間モデリングをストリーミング認識に拡張し、時空間機能の融合を可能にする最初の作品であるため、注目に値します。困難な Argoverse-HD データセットで LongShortNet を評価し、追加の計算コストがほとんどない既存の最先端の方法よりも優れていることを実証します。

Streaming perception is a critical task in autonomous driving that requires balancing the latency and accuracy of the autopilot system. However, current methods for streaming perception are limited as they only rely on the current and adjacent two frames to learn movement patterns. This restricts their ability to model complex scenes, often resulting in poor detection results. To address this limitation, we propose LongShortNet, a novel dual-path network that captures long-term temporal motion and integrates it with short-term spatial semantics for real-time perception. LongShortNet is notable as it is the first work to extend long-term temporal modeling to streaming perception, enabling spatiotemporal feature fusion. We evaluate LongShortNet on the challenging Argoverse-HD dataset and demonstrate that it outperforms existing state-of-the-art methods with almost no additional computational cost.

updated: Thu Mar 30 2023 04:02:18 GMT+0000 (UTC)

published: Thu Oct 27 2022 14:57:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト