LongShortNet: Exploring Temporal and Semantic Features Fusion in Streaming Perception

Chenyang Li; Zhi-Qi Cheng; Jun-Yan He; Pengyu Li; Bin Luo; Han-Yuan Chen; Yifeng Geng; Jin-Peng Lan; Xuansong Xie

LongShortNet: 時間的特徴と意味的特徴の探索ストリーミング認識における融合

ストリーミング認識は、自動運転システムの遅延と精度を首尾一貫して考慮する自動運転の現状を報告するタスクです。ただし、既存のストリーミング認識は、現在および隣接する 2 つのフレームのみを入力として使用して動きのパターンを学習するため、実際の複雑なシーンをモデル化できず、検出結果が失敗します。この問題を解決するために、LongShortNet と呼ばれるエンドツーエンドのデュアルパスネットワークを提案します。これは、長期的な時間的動きをキャプチャし、リアルタイムの知覚のために短期的な空間セマンティクスでそれを調整します。さらに、Long-Short Fusion Module (LSFM) を調査して、時空間機能の融合を調査します。これは、ストリーミング知覚で長期的な時間的機能を拡張する最初の作業です。提案された LongShortNet を評価し、ベンチマークデータセット Argoverse-HD の既存の方法と比較します。結果は、提案された LongShortNet が他の最先端の方法よりも優れており、追加の計算コストがほとんどないことを示しています。

Streaming perception is a task of reporting the current state of autonomous driving, which coherently considers the latency and accuracy of autopilot systems. However, the existing streaming perception only uses the current and adjacent two frames as input for learning the movement patterns, which cannot model actual complex scenes, resulting in failed detection results. To solve this problem, we propose an end-to-end dual-path network dubbed LongShortNet, which captures long-term temporal motion and calibrates it with short-term spatial semantics for real-time perception. Moreover, we investigate a Long-Short Fusion Module (LSFM) to explore spatiotemporal feature fusion, which is the first work to extend long-term temporal in streaming perception. We evaluate the proposed LongShortNet and compare it with existing methods on the benchmark dataset Argoverse-HD. The results demonstrate that the proposed LongShortNet outperforms the other state-of-the-art methods with almost no extra computational cost.

updated: Wed Nov 23 2022 13:26:22 GMT+0000 (UTC)

published: Thu Oct 27 2022 14:57:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト