Neural Naturalist: Generating Fine-Grained Image Comparisons

Maxwell Forbes; Christine Kaeser-Chen; Piyush Sharma; Serge Belongie

神経自然主義者：細粒度の画像比較の生成

鳥の写真間のきめの細かい違いを説明する、41k文の新しいBirds-to-Wordsデータセットを紹介します。収集された言語は非常に詳細ですが、日常の観察者には理解されたままです（例：「ハート型の顔」、「しゃがんだ体」）。パラグラフの長さの説明は、さまざまなレベルの分類学的および視覚的距離に自然に適合します----新しい層別サンプリングアプローチから---適切な詳細レベルで。共同画像エンコーディングと比較モジュールを使用して比較言語を生成し、説明を使用して実際の画像を区別する必要がある人間と結果を評価する、Neural Naturalistという新しいモデルを提案します。私たちの結果は、自然言語を使用して視覚埋め込み空間の違いを説明する神経モデルの有望な可能性と、生物多様性を保護する努力で市民科学者を支援する機械学習の具体的な経路を示しています。

We introduce the new Birds-to-Words dataset of 41k sentences describing fine-grained differences between photographs of birds. The language collected is highly detailed, while remaining understandable to the everyday observer (e.g., "heart-shaped face," "squat body"). Paragraph-length descriptions naturally adapt to varying levels of taxonomic and visual distance---drawn from a novel stratified sampling approach---with the appropriate level of detail. We propose a new model called Neural Naturalist that uses a joint image encoding and comparative module to generate comparative language, and evaluate the results with humans who must use the descriptions to distinguish real images. Our results indicate promising potential for neural models to explain differences in visual embedding space using natural language, as well as a concrete path for machine learning to aid citizen scientists in their effort to preserve biodiversity.

updated: Thu Nov 14 2019 01:19:36 GMT+0000 (UTC)

published: Mon Sep 09 2019 18:54:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト