QR-CLIP: Introducing Explicit Open-World Knowledge for Location and Time Reasoning

Weimin Shi; Mingchen Zhuge; Dehong Gao; Zhong Zhou; Ming-Ming Cheng; Deng-Ping Fan

QR-CLIP: 場所と時間の推論のための明示的なオープンワールド知識の紹介

日常のイメージは抽象的な意味を伝え、そこから深遠な情報を記憶し、推測する必要があります。このような人間のような推論を促進するために、この作業では、従来のセグメンテーションや分類などの基本的なタスクを実行するのではなく、どこでいつ撮影されたかを予測するよう機械に教えます。 Horn の QR 理論に着想を得て、2 つのコンポーネントで構成される新しい QR-CLIP モデルを設計しました。 2) 関連性モジュールは、視覚と言語の手がかりを慎重に推定し、場所と時間を推測します。実験は、QR-CLIP の有効性を示しており、各タスクで以前の SOTA よりも、場所と時間の推論に関して平均で約 10% と 130% の相対リフトを上回っています。この研究は、場所と時間の推論の技術的基礎を築き、オープンワールドの知識を効果的に導入することがタスクの万能薬の1つであることを示唆しています.

Daily images may convey abstract meanings that require us to memorize and infer profound information from them. To encourage such human-like reasoning, in this work, we teach machines to predict where and when it was taken rather than performing basic tasks like traditional segmentation or classification. Inspired by Horn's QR theory, we designed a novel QR-CLIP model consisting of two components: 1) the Quantity module first retrospects more open-world knowledge as the candidate language inputs; 2) the Relevance module carefully estimates vision and language cues and infers the location and time. Experiments show our QR-CLIP's effectiveness, and it outperforms the previous SOTA on each task by an average of about 10% and 130% relative lift in terms of location and time reasoning. This study lays a technical foundation for location and time reasoning and suggests that effectively introducing open-world knowledge is one of the panaceas for the tasks.

updated: Wed Jun 28 2023 09:41:25 GMT+0000 (UTC)

published: Thu Feb 02 2023 08:44:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト