Trailers12k: Improving Transfer Learning with a Dual Image and Video Transformer for Multi-label Movie Trailer Genre Classification

Ricardo Montalvo-Lezama; Berenice Montalvo-Lezama; Gibran Fuentes-Pineda

Trailers12k: マルチラベル映画予告編のジャンル分類のためのデュアルイメージおよびビデオトランスフォーマーによる転移学習の改善

この論文では、マルチラベルの映画予告編のジャンル分類のための Dual Image and Video Transformer Architecture (DIViTA) を提案します。 DIViTA は、ショット検出を使用してトレーラーを相関性の高いクリップにセグメント化する入力適応ステージを実行し、事前トレーニング済みの ImageNet や Kinetics バックボーンを活用できる、よりまとまりのある入力を提供します。手動で検証されたタイトルと予告編のペアを含む映画の予告編データセットである Trailers12k を紹介し、ImageNet と Kinetics から学習した表現の Trailers12k への転送可能性の研究を提示します。私たちの結果は、DIViTA がソースデータセットとターゲットデータセットの時空間構造間のギャップを縮小できることを示しているため、転送可能性が向上します。さらに、ImageNet または Kinetics のいずれかで学習した表現は、Trailers12k に比較的転送可能ですが、それらを組み合わせて分類パフォーマンスを向上させることができる補完的な情報を提供します。興味深いことに、事前トレーニング済みの軽量 ConvNet は、重い ConvNet や Transformer と比較してコンピューティングリソースの一部を使用しながら、競争力のある分類パフォーマンスを提供します。

In this paper, we propose Dual Image and Video Transformer Architecture (DIViTA) for multi-label movie trailer genre classification. DIViTA performs an input adaption stage that uses shot detection to segment the trailer into highly correlated clips, providing a more cohesive input that allows to leverage pretrained ImageNet and/or Kinetics backbones. We introduce Trailers12k, a movie trailer dataset with manually verified title-trailer pairs, and present a transferability study of representations learned from ImageNet and Kinetics to Trailers12k. Our results show that DIViTA can reduce the gap between the spatio-temporal structure of the source and target datasets, thus improving transferability. Moreover, representations learned on either ImageNet or Kinetics are comparatively transferable to Trailers12k, although they provide complementary information that can be combined to improve classification performance. Interestingly, pretrained lightweight ConvNets provide competitive classification performance, while using a fraction of the computing resources compared to heavier ConvNets and Transformers.

updated: Wed Oct 19 2022 20:28:19 GMT+0000 (UTC)

published: Fri Oct 14 2022 17:27:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト