TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation

Pengfei Li; Beiwen Tian; Yongliang Shi; Xiaoxue Chen; Hao Zhao; Guyue Zhou; Ya-Qin Zhang

TOIST: 名詞-代名詞蒸留によるタスク指向インスタンスセグメンテーショントランスフォーマー

現在の参照表現理解アルゴリズムは、名詞によって示されるオブジェクトを効果的に検出またはセグメント化できますが、動詞参照を理解する方法はまだ調査されていません。そのため、タスク指向の検出という困難な問題を研究します。これは、快適に座るなどの動詞によって示されるアクションを最も適切に実行できるオブジェクトを見つけることを目的としています。ロボットの相互作用などのダウンストリームアプリケーションにより適した、より細かいローカリゼーションに向けて、問題をタスク指向のインスタンスセグメンテーションに拡張します。このタスクの固有の要件は、可能な選択肢の中から優先候補を選択することです。したがって、ペアワイズクエリ関係を注意を払って自然にモデル化するトランスフォーマーアーキテクチャに頼り、TOISTメソッドにつながります。事前にトレーニングされた名詞参照表現理解モデルと、トレーニング中に特権名詞のグラウンドトゥルースにアクセスできるという事実を活用するために、新しい名詞代名詞抽出フレームワークが提案されています。名詞のプロトタイプは教師なしで生成され、コンテキスト代名詞の特徴はプロトタイプを選択するようにトレーニングされます。そのため、ネットワークは推論中に名詞にとらわれないままです。大規模なタスク指向のデータセット COCO-Tasks で TOIST を評価し、報告された最良の結果より +10.9% 高い mAP^box を達成しました。提案された名詞代名詞蒸留は、mAP^box と mAP^mask を +2.8% と +3.8% 増加させることができます。コードとモデルは、https://github.com/AIR-DISCOVER/TOIST で公開されています。

Current referring expression comprehension algorithms can effectively detect or segment objects indicated by nouns, but how to understand verb reference is still under-explored. As such, we study the challenging problem of task oriented detection, which aims to find objects that best afford an action indicated by verbs like sit comfortably on. Towards a finer localization that better serves downstream applications like robot interaction, we extend the problem into task oriented instance segmentation. A unique requirement of this task is to select preferred candidates among possible alternatives. Thus we resort to the transformer architecture which naturally models pair-wise query relationships with attention, leading to the TOIST method. In order to leverage pre-trained noun referring expression comprehension models and the fact that we can access privileged noun ground truth during training, a novel noun-pronoun distillation framework is proposed. Noun prototypes are generated in an unsupervised manner and contextual pronoun features are trained to select prototypes. As such, the network remains noun-agnostic during inference. We evaluate TOIST on the large-scale task oriented dataset COCO-Tasks and achieve +10.9% higher mAP^box than the best-reported results. The proposed noun-pronoun distillation can boost mAP^box and mAP^mask by +2.8% and +3.8%. Codes and models are publicly available at https://github.com/AIR-DISCOVER/TOIST.

updated: Wed Oct 19 2022 17:59:56 GMT+0000 (UTC)

published: Wed Oct 19 2022 17:59:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト