Less is More: Sparse Sampling for Dense Reaction Predictions

Kezhou Lin; Xiaohan Wang; Zhedong Zheng; Linchao Zhu; Yi Yang

少ないほど多い: 密な反応予測のためのスパースサンプリング

動画から視聴者の反応を取得することは、クリエイターやストリーミングプラットフォームが動画のパフォーマンスを分析し、将来のユーザーエクスペリエンスを向上させるのに役立ちます。このレポートでは、2021 年のビデオからの Evoked Expression チャレンジの方法を紹介します。特に、私たちのモデルは、視聴者の感情の変化を予測するための入力として、オーディオと画像の両方のモダリティを利用します。長期的な感情の変化をモデル化するために、GRU ベースのモデルを使用して、1Hz の 1 つのスパース信号を予測します。感情の変化がスムーズであることがわかります。したがって、予測変動に対してロバストな信号を線形補間することによって、最終的な高密度予測が得られます。単純ではありますが、提案された方法は、最終的なプライベートテストセットでピアソンの相関スコア 0.04430 を達成しました。

Obtaining viewer responses from videos can be useful for creators and streaming platforms to analyze the video performance and improve the future user experience. In this report, we present our method for 2021 Evoked Expression from Videos Challenge. In particular, our model utilizes both audio and image modalities as inputs to predict emotion changes of viewers. To model long-range emotion changes, we use a GRU-based model to predict one sparse signal with 1Hz. We observe that the emotion changes are smooth. Therefore, the final dense prediction is obtained via linear interpolating the signal, which is robust to the prediction fluctuation. Albeit simple, the proposed method has achieved pearson's correlation score of 0.04430 on the final private test set.

updated: Thu Jun 03 2021 11:33:59 GMT+0000 (UTC)

published: Thu Jun 03 2021 11:33:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト