A survey of Object Classification and Detection based on 2D/3D data

Xiaoke Shen

2D / 3Dデータに基づくオブジェクトの分類と検出の調査

最近、ディープニューラルネットワークベースのアルゴリズムを使用することにより、オブジェクトの分類、検出、およびセマンティックセグメンテーションソリューションが大幅に改善されています。ただし、2D画像ベースのシステムの1つの課題は、正確な3D位置情報を提供できないことです。これは、自動運転やロボットナビゲーションなどの場所に敏感なアプリケーションにとって重要です。一方、RGB-DやRGB-LiDARベースのシステムなどの3Dメソッドは、RGBのみのアプローチを大幅に改善するソリューションを提供できます。そのため、これは産業界と学界の両方にとって興味深い研究分野です。 2D画像ベースのシステムと比較して、3Dベースのシステムは次の5つの理由により複雑です。1）データ表現自体がより複雑です。 3D画像は、点群、メッシュ、ボリュームで表すことができます。 2D画像にはピクセルグリッド表現があります。 2）余分な次元が追加されると、計算とメモリリソースの要件が高くなります。 3）オブジェクトの分布の違いや、屋内と屋外のシーン領域の違いにより、1つの統合されたフレームワークを実現するのが困難になります。 4）3Dデータは、特に屋外のシナリオでは、高密度の2D画像と比較してまばらであるため、検出タスクがより困難になります。最後に、教師ありベースのアルゴリズムにとって非常に重要な大きなサイズのラベル付きデータセットは、ImageNetなどの十分に構築された2Dデータセットと比較してまだ構築中です。上記の課題に基づいて、説明されているシステムは、アプリケーションシナリオ、データ表現方法、および対処される主なタスクごとに編成されています。同時に、3Dシステムに大きな影響を与える重要な2Dベースのシステムも導入され、それらの間の接続が示されます。

Recently, by using deep neural network based algorithms, object classification, detection and semantic segmentation solutions are significantly improved. However, one challenge for 2D image-based systems is that they cannot provide accurate 3D location information. This is critical for location sensitive applications such as autonomous driving and robot navigation. On the other hand, 3D methods, such as RGB-D and RGB-LiDAR based systems, can provide solutions that significantly improve the RGB only approaches. That is why this is an interesting research area for both industry and academia. Compared with 2D image-based systems, 3D-based systems are more complicated due to the following five reasons: 1) Data representation itself is more complicated. 3D images can be represented by point clouds, meshes, volumes. 2D images have pixel grid representations. 2) The computation and memory resource requirement is higher as an extra dimension is added. 3) Different distribution of the objects and difference in scene areas between indoor and outdoor make one unified framework hard to achieve. 4) 3D data, especially for the outdoor scenario, is sparse compared with the dense 2D images which makes the detection task more challenging. Finally, large size labelled datasets, which are extremely important for supervised based algorithms, are still under construction compared with well-built 2D datasets such as ImageNet. Based on challenges listed above, the described systems are organized by application scenarios, data representation methods and main tasks addressed. At the same time, critical 2D based systems which greatly influence the 3D ones are also introduced to show the connection between them.

updated: Sat Jan 22 2022 04:57:44 GMT+0000 (UTC)

published: Wed May 29 2019 19:15:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト