Real-time Human-Centric Segmentation for Complex Video Scenes

Ran Yu; Chenyu Tian; Weihao Xia; Xinyuan Zhao; Haoqian Wang; Yujiu Yang

複雑なビデオシーンのためのリアルタイムの人間中心のセグメンテーション

「人間」に関連するほとんどの既存のビデオタスクは、ビデオ内の不特定の他のものを無視して、顕著な人間のセグメンテーションに焦点を合わせています。歩行者や他の州の人間（着席、乗車、閉塞など）を含む、複雑なビデオ内のすべての人間のセグメント化と追跡に焦点を当てた研究はほとんどありません。この論文では、HVISNetと略される新しいフレームワークを提案します。このフレームワークは、1ステージの検出器に基づいて、特定のビデオで提示されたすべての人々をセグメント化および追跡します。複雑なシーンをより適切に評価するために、HVIS（Human Video Instance Segmentation）と呼ばれる新しいベンチマークを提供します。これは、さまざまなシーンの805の高解像度ビデオに1447のヒューマンインスタンスマスクを含みます。広範な実験により、提案されたHVISNetは、特に複雑なビデオシーンで、リアルタイムの推論速度（30 FPS）での精度の点で最先端の方法よりも優れていることが示されています。また、バウンディングボックスの中心を使用してさまざまな個人を区別すると、特に高度に閉塞された状態で、セグメンテーションの精度が大幅に低下することにも気づきました。この一般的な現象は、あいまいなポジティブサンプル問題と呼ばれます。この問題を軽減するために、インスタンスのセグメンテーションの精度を向上させるために、Inner CenterSamplingという名前のメカニズムを提案します。このようなプラグアンドプレイの内部中心サンプリングメカニズムは、パフォーマンスを向上させるために、1ステージ検出器に基づく任意のインスタンスセグメンテーションモデルに組み込むことができます。特に、閉塞した人間の場合、最先端の方法で4.1mAPの改善が得られます。コードとデータはhttps://github.com/IIGROUP/HVISNetで入手できます。

Most existing video tasks related to "human" focus on the segmentation of salient humans, ignoring the unspecified others in the video. Few studies have focused on segmenting and tracking all humans in a complex video, including pedestrians and humans of other states (e.g., seated, riding, or occluded). In this paper, we propose a novel framework, abbreviated as HVISNet, that segments and tracks all presented people in given videos based on a one-stage detector. To better evaluate complex scenes, we offer a new benchmark called HVIS (Human Video Instance Segmentation), which comprises 1447 human instance masks in 805 high-resolution videos in diverse scenes. Extensive experiments show that our proposed HVISNet outperforms the state-of-the-art methods in terms of accuracy at a real-time inference speed (30 FPS), especially on complex video scenes. We also notice that using the center of the bounding box to distinguish different individuals severely deteriorates the segmentation accuracy, especially in heavily occluded conditions. This common phenomenon is referred to as the ambiguous positive samples problem. To alleviate this problem, we propose a mechanism named Inner Center Sampling to improve the accuracy of instance segmentation. Such a plug-and-play inner center sampling mechanism can be incorporated in any instance segmentation models based on a one-stage detector to improve the performance. In particular, it gains 4.1 mAP improvement on the state-of-the-art method in the case of occluded humans. Code and data are available at https://github.com/IIGROUP/HVISNet.

updated: Mon Aug 16 2021 16:07:51 GMT+0000 (UTC)

published: Mon Aug 16 2021 16:07:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト