Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition

Yunhao Ge; Jiaping Zhao; Laurent Itti

ポーズ拡張：オブジェクト認識のためのクラスに依存しないオブジェクトポーズ変換

オブジェクトのポーズにより、クラス内のオブジェクトの分散が増加し、2D画像からのオブジェクトの認識が困難になります。分類器をポーズの変化に対してロバストにするために、ほとんどのディープニューラルネットワークは、クラスごとに多くのポーズを持つ大規模なデータセットを使用して、ポーズの影響を排除しようとします。ここでは、別のアプローチを提案します。クラスにとらわれないオブジェクトポーズ変換ネットワーク（OPT-Net）は、3Dヨー軸とピッチ軸に沿って画像を変換し、追加のポーズを連続的に合成できます。合成された画像は、オブジェクト分類器のより良いトレーニングにつながります。オブジェクトIDからポーズを明示的に解きほぐすために、新しい削除追加構造を設計します。最初に入力画像のポーズ情報を削除し、次にターゲットポーズ情報（連続変数として正規化）を追加して、任意のターゲットポーズを合成します。 iLab-20Mデータセットからターンテーブルで撮影されたおもちゃの乗り物の画像でOPT-Netをトレーニングしました。不均衡な離散ポーズ（オブジェクトインスタンスごとに6つのポーズを持つ5つのクラス、および2つのポーズのみを持つ5つのクラス）でトレーニングした後、OPT-Netがヨー軸とピッチ軸に沿ってバランスの取れた連続した新しいポーズを高品質で合成できることを示します。元のポーズと合成されたポーズを使用してResNet-18分類器をトレーニングすると、元のポーズのみで9％オーバートレーニングすることでmAPの精度が向上します。さらに、事前にトレーニングされたOPT-Netは、新しいオブジェクトクラスに一般化できます。これは、iLab-20MとRGB-Dの両方で実証されています。また、学習した機能をImageNetに一般化できることも示します。

Object pose increases intraclass object variance which makes object recognition from 2D images harder. To render a classifier robust to pose variations, most deep neural networks try to eliminate the influence of pose by using large datasets with many poses for each class. Here, we propose a different approach: a class-agnostic object pose transformation network (OPT-Net) can transform an image along 3D yaw and pitch axes to synthesize additional poses continuously. Synthesized images lead to better training of an object classifier. We design a novel eliminate-add structure to explicitly disentangle pose from object identity: first eliminate pose information of the input image and then add target pose information (regularized as continuous variables) to synthesize any target pose. We trained OPT-Net on images of toy vehicles shot on a turntable from the iLab-20M dataset. After training on unbalanced discrete poses (5 classes with 6 poses per object instance, plus 5 classes with only 2 poses), we show that OPT-Net can synthesize balanced continuous new poses along yaw and pitch axes with high quality. Training a ResNet-18 classifier with original plus synthesized poses improves mAP accuracy by 9% overtraining on original poses only. Further, the pre-trained OPT-Net can generalize to new object classes, which we demonstrate on both iLab-20M and RGB-D. We also show that the learned features can generalize to ImageNet.

updated: Thu Jan 14 2021 02:03:09 GMT+0000 (UTC)

published: Thu Mar 19 2020 00:39:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト