Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

Md Mohaiminul Islam; Gedas Bertasius

分割された時空間注意メカニズムを使用した自己中心性ビデオにおけるオブジェクト状態変化分類

このレポートでは、Ego4D：Object State ChangeClassificationChallengeの「TarHeels」と呼ばれる提出物について説明します。トランスフォーマーベースのビデオ認識モデルを使用し、分割された時空間注意メカニズムを活用して、自己中心的なビデオのオブジェクト状態の変化を分類します。私たちの提出物は、チャレンジで2番目に良いパフォーマンスを達成します。さらに、自己中心的なビデオでオブジェクトの状態の変化を特定するには、時間的なモデリング機能が必要であることを示すために、アブレーション研究を実行します。最後に、モデルの予測を視覚化するために、いくつかのポジティブな例とネガティブな例を示します。コードはhttps://github.com/md-mohaiminul/ObjectStateChangeで公開されています

This report describes our submission called "TarHeels" for the Ego4D: Object State Change Classification Challenge. We use a transformer-based video recognition model and leverage the Divided Space-Time Attention mechanism for classifying object state change in egocentric videos. Our submission achieves the second-best performance in the challenge. Furthermore, we perform an ablation study to show that identifying object state change in egocentric videos requires temporal modeling ability. Lastly, we present several positive and negative examples to visualize our model's predictions. The code is publicly available at: https://github.com/md-mohaiminul/ObjectStateChange

updated: Sun Jul 24 2022 20:53:36 GMT+0000 (UTC)

published: Sun Jul 24 2022 20:53:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト