Detecting Human-Object Interaction with Mixed Supervision

Suresh Kirthi Kumaraswamy; Miaojing Shi; Ewa Kijak

混合監視による人と物体の相互作用の検出

人間の物体の相互作用（HOI）の検出は、画像の理解と推論における重要なタスクです。それはHOIトリプレットの形です、人間とオブジェクトの境界ボックス、およびタスクを完了するためのそれらの間のアクションが必要です。言い換えれば、このタスクには、トレーニングのための強力な監督が必要ですが、調達は困難です。これを克服するための自然な解決策は、弱教師あり学習を追求することです。この場合、画像内の特定のHOIトリプレットの存在しかわかりませんが、正確な位置は不明です。ほとんどの弱教師あり学習方法は、利用可能な場合、強力な教師ありのデータを活用するための準備をしていません。実際、HOI検出におけるこの2つのパラダイムのナイーブな組み合わせは、相互に貢献することができません。この点で、混合教師ありHOI検出パイプラインを提案します。これら2種類の教師間でシームレスに学習する、運動量に依存しない学習の特定の設計のおかげです。さらに、混合監視における注釈の不十分さを考慮して、画像全体で多様でハードなネガを合成し、モデルの堅牢性を向上させるためのHOI要素スワッピング技術を導入します。私たちの方法は、挑戦的なHICO-DETデータセットで評価されます。強力なアノテーションと弱いアノテーションを混合して使用することにより、完全に監視された多くのメソッドに近いか、それよりも優れたパフォーマンスを発揮します。さらに、同じ監視下で、弱く完全に監視された方法を代表する最先端の方法よりも優れています。

Human object interaction (HOI) detection is an important task in image understanding and reasoning. It is in a form of HOI triplet , requiring bounding boxes for human and object, and action between them for the task completion. In other words, this task requires strong supervision for training that is however hard to procure. A natural solution to overcome this is to pursue weakly-supervised learning, where we only know the presence of certain HOI triplets in images but their exact location is unknown. Most weakly-supervised learning methods do not make provision for leveraging data with strong supervision, when they are available; and indeed a naïve combination of this two paradigms in HOI detection fails to make contributions to each other. In this regard we propose a mixed-supervised HOI detection pipeline: thanks to a specific design of momentum-independent learning that learns seamlessly across these two types of supervision. Moreover, in light of the annotation insufficiency in mixed supervision, we introduce an HOI element swapping technique to synthesize diverse and hard negatives across images and improve the robustness of the model. Our method is evaluated on the challenging HICO-DET dataset. It performs close to or even better than many fully-supervised methods by using a mixed amount of strong and weak annotations; furthermore, it outperforms representative state of the art weakly and fully-supervised methods under the same supervision.

updated: Thu Nov 12 2020 14:14:21 GMT+0000 (UTC)

published: Tue Nov 10 2020 08:42:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト