Key Frame Extraction with Attention Based Deep Neural Networks

Samed Arslan; Senem Tanberk

アテンションベースのディープニューラルネットワークによるキーフレーム抽出

ビデオからの自動キーフレーム検出は、長いビデオのコンテンツを最もよく要約できるシーンを選択するための演習です。ビデオの概要を提供することは、素早いブラウジングとコンテンツの要約を容易にするための重要なタスクです。結果として得られる写真は、さまざまな業界の自動化された作業 (セキュリティ映像の要約、音楽クリップで使用されるさまざまなシーンの検出など) に使用されます。さらに、高度な機械学習手法で大量のビデオを処理すると、リソースのコストも発生します。キーフレームを取得しました。使用するメソッドやモデルへの入力特徴として使用できます。この研究では;私たちは、アテンション層を備えたディープオートエンコーダーモデルを使用したキーフレーム検出のためのディープラーニングベースのアプローチを提案します。提案された方法では、まずオートエンコーダのエンコーダ部分を使用してビデオフレームから特徴を抽出し、k-means クラスタリングアルゴリズムを使用してセグメンテーションを適用して、これらの特徴と類似のフレームをグループ化します。次に、クラスターの中心に最も近いフレームを選択することによって、各クラスターからキーフレームが選択されます。この方法は TVSUM ビデオデータセットで評価され、分類精度 0.77 を達成しました。これは、多くの既存の方法よりも高い成功率を示しています。提案された方法は、ビデオ分析におけるキーフレーム抽出のための有望なソリューションを提供し、ビデオ要約やビデオ検索などのさまざまなアプリケーションに適用できます。

Automatic keyframe detection from videos is an exercise in selecting scenes that can best summarize the content for long videos. Providing a summary of the video is an important task to facilitate quick browsing and content summarization. The resulting photos are used for automated works (e.g. summarizing security footage, detecting different scenes used in music clips) in different industries. In addition, processing high-volume videos in advanced machine learning methods also creates resource costs. Keyframes obtained; It can be used as an input feature to the methods and models to be used. In this study; We propose a deep learning-based approach for keyframe detection using a deep auto-encoder model with an attention layer. The proposed method first extracts the features from the video frames using the encoder part of the autoencoder and applies segmentation using the k-means clustering algorithm to group these features and similar frames together. Then, keyframes are selected from each cluster by selecting the frames closest to the center of the clusters. The method was evaluated on the TVSUM video dataset and achieved a classification accuracy of 0.77, indicating a higher success rate than many existing methods. The proposed method offers a promising solution for key frame extraction in video analysis and can be applied to various applications such as video summarization and video retrieval.

updated: Wed Jun 21 2023 15:09:37 GMT+0000 (UTC)

published: Wed Jun 21 2023 15:09:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト