Technical Report: Temporal Aggregate Representations

Fadime Sener; Dibyadip Chatterjee; Angela Yao

テクニカルレポート: 時間的集計表現

このテクニカルレポートは、[9] で提示された作業をさらに実験的に拡張したものです。 [9] では、長期的なビデオの理解に取り組みます。これには、現在、過去、または将来の観察からの推論が必要であり、いくつかの基本的な問題が提起されます。時間的または連続的な関係をどのようにモデル化する必要がありますか?どのような時間範囲の情報とコンテキストを処理する必要がありますか?それらはどの時間スケールで導出されるべきですか? [9] は、柔軟なマルチグラニュラー時間集約フレームワークを使用してこれらの質問に対処します。このレポートでは、さまざまなタスクと新しいデータセット EPIC-KITCHENS-100 でこのフレームワークを使用してさらに実験を行います。

This technical report extends our work presented in [9] with more experiments. In [9], we tackle long-term video understanding, which requires reasoning from current and past or future observations and raises several fundamental questions. How should temporal or sequential relationships be modelled? What temporal extent of information and context needs to be processed? At what temporal scale should they be derived? [9] addresses these questions with a flexible multi-granular temporal aggregation framework. In this report, we conduct further experiments with this framework on different tasks and a new dataset, EPIC-KITCHENS-100.

updated: Tue Jun 15 2021 07:11:24 GMT+0000 (UTC)

published: Sun Jun 06 2021 15:27:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト