Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue

Holy Lovenia; Samuel Cahyawijaya; Pascale Fung

どちらを指していますか?状況対話におけるマルチモーダルオブジェクト識別

マルチモーダル対話システムの需要はさまざまな分野で高まっており、会話および状況コンテキストからマルチモーダル入力を解釈することの重要性が強調されています。この問題に取り組むための 3 つの方法を検討し、最大の状況対話データセットである SIMMC 2.1 でそれらを評価します。私たちの最良の方法であるシーンとダイアログの調整により、SIMMC 2.1 ベースラインと比較して F1 スコアが最大 20% 向上します。私たちの方法の限界と将来の研究の潜在的な方向性に関する分析と議論を提供します。私たちのコードは、https://github.com/holylovenia/multimodal-object-identification で公開されています。

The demand for multimodal dialogue systems has been rising in various domains, emphasizing the importance of interpreting multimodal inputs from conversational and situational contexts. We explore three methods to tackle this problem and evaluate them on the largest situated dialogue dataset, SIMMC 2.1. Our best method, scene-dialogue alignment, improves the performance by ~20% F1-score compared to the SIMMC 2.1 baselines. We provide analysis and discussion regarding the limitation of our methods and the potential directions for future works. Our code is publicly available at https://github.com/holylovenia/multimodal-object-identification.

updated: Wed Mar 15 2023 14:38:21 GMT+0000 (UTC)

published: Tue Feb 28 2023 15:45:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト