Accelerating the creation of instance segmentation training sets through bounding box annotation

Niels Sayez; Christophe De Vleeschouwer

バウンディングボックスアノテーションによるインスタンスセグメンテーショントレーニングセットの作成の加速

特定のアプリケーションコンテキストでCNNを展開する場合、画像注釈の収集は依然として大きな負担になります。これは、注釈がオブジェクトインスタンスをカバーするバイナリマスクで構成されている場合に特に当てはまります。私たちの仕事は、半自動アプローチに基づいて、3つのステップでインスタンスを描くことを提案しています。（1）オブジェクトの極値（左端、右端、上、下のピクセル）を手動で定義し、それによってオブジェクトの境界を提供します。 -ボックス、（2）Deep Extreme Cutのようなユニバーサル自動セグメンテーションツールを使用して、境界オブジェクトを極値に一致するセグメンテーションマスクに変換します。（3）予測されたマスクが手動で修正されます。次に、他のインスタンスバウンディングボックスとのオーバーラップに基づいてインスタンスマスクの修正が優先される場合や、トレーニングされたインスタンスセグメンテーションモデルの結果など、バウンディングボックスの定義とマスク修正の間で人間の手動注釈リソースのバランスをとるためにさまざまな戦略が調査されます。部分的に注釈が付けられたデータセット。私たちの実験的研究では、チームポートプレーヤーのセグメンテーションタスクを検討し、Panoptic-Deeplabインスタンスセグメンテーションモデルの精度が人間の注釈リソース割り当て戦略にどのように依存するかを測定します。極値の唯一の定義は、インスタンスの完全に手動の描写によってマスクが定義された場合、最大10倍のリソースを必要とするモデルの精度をもたらすことを明らかにしています。より高い精度をターゲットにする場合、トレーニングセットインスタンス間でマスク修正を優先すると、同じトレーニング済みインスタンスセグメンテーションモデルの精度で、インスタンスのフレームごとの体系的な修正と比較して、修正アノテーションリソースの最大80％を節約できることも示されています。

Collecting image annotations remains a significant burden when deploying CNN in a specific applicative context. This is especially the case when the annotation consists in binary masks covering object instances. Our work proposes to delineate instances in three steps, based on a semi-automatic approach: (1) the extreme points of an object (left-most, right-most, top, bottom pixels) are manually defined, thereby providing the object bounding-box, (2) a universal automatic segmentation tool like Deep Extreme Cut is used to turn the bounded object into a segmentation mask that matches the extreme points; and (3) the predicted mask is manually corrected. Various strategies are then investigated to balance the human manual annotation resources between bounding-box definition and mask correction, including when the correction of instance masks is prioritized based on their overlap with other instance bounding-boxes, or the outcome of an instance segmentation model trained on a partially annotated dataset. Our experimental study considers a teamsport player segmentation task, and measures how the accuracy of the Panoptic-Deeplab instance segmentation model depends on the human annotation resources allocation strategy. It reveals that the sole definition of extreme points results in a model accuracy that would require up to 10 times more resources if the masks were defined through fully manual delineation of instances. When targeting higher accuracies, prioritizing the mask correction among the training set instances is also shown to save up to 80% of correction annotation resources compared to a systematic frame by frame correction of instances, for a same trained instance segmentation model accuracy.

updated: Mon May 23 2022 18:37:03 GMT+0000 (UTC)

published: Mon May 23 2022 18:37:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト