One-stage Action Detection Transformer

Lijun Li; Li'an Zhuo; Bang Zhang

ワンステージアクション検出トランス

この作業では、EPIC-KITCHENS-1002022アクション検出の課題に対するソリューションを紹介します。ビデオセグメントの時間的接続をモデル化するために、1ステージアクション検出トランス（OADT）が提案されています。 OADTの助けを借りて、カテゴリと時間境界の両方を同時に認識することができます。さまざまな機能からトレーニングされた複数のOADTモデルをアンサンブルした後、モデルは21.28％のアクションmAPに到達し、アクション検出チャレンジのテストセットで1位にランク付けされます。

In this work, we introduce our solution to the EPIC-KITCHENS-100 2022 Action Detection challenge. One-stage Action Detection Transformer (OADT) is proposed to model the temporal connection of video segments. With the help of OADT, both the category and time boundary can be recognized simultaneously. After ensembling multiple OADT models trained from different features, our model can reach 21.28% action mAP and ranks the 1st on the test-set of the Action detection challenge.

updated: Tue Jun 21 2022 02:24:22 GMT+0000 (UTC)

published: Tue Jun 21 2022 02:24:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト