Instance Segmentation for Autonomous Log Grasping in Forestry Operations

Jean-Michel Fortin; Olivier Gamache; Vincent Grondin; François Pomerleau; Philippe Giguère

林業活動における自律ログ把握のためのインスタンスセグメンテーション

木材の丸太の摘み取りは、自動化するのが難しい作業です。実際、ログは通常、雑然とした構成で、ランダムに方向付けられ、重複しています。ログピッキングの自動化に関する最近の研究では、通常、実際の知覚の問題をほとんど考慮せずに、ログのポーズがわかっていることを前提としています。このホワイトペーパーでは、データ駆動型のアプローチを使用して、後者に真っ向から取り組みます。最初に、TimberSeg 1.0という名前の新しいデータセットを紹介します。これは、高密度に注釈が付けられています。つまり、バウンディングボックスとログのピクセルレベルのマスク注釈の両方が含まれています。このデータセットは、2500の個別にセグメント化されたログを含む220の画像で構成されています。次に、データセットを使用して、個々のログの検出とセグメンテーションのタスクに関する3つのニューラルネットワークアーキテクチャを比較します。 2つの領域ベースの方法と1つの注意ベースの方法。当然のことながら、私たちの結果は、ログの方向性を考慮に入れていない軸に沿った提案が19.03mAPでパフォーマンスが低いことを示しています。回転を意識した提案方法は、結果を31.83mAPに大幅に改善します。さらに興味深いことに、回転に誘導バイアスをかけないTransformerベースのアプローチは、他の2つのアプローチを上回り、データセットで57.53のmAPを達成しました。私たちのユースケースは、雑然とした細長いオブジェクトに対する領域ベースのアプローチの制限を示しています。また、ピクセルレベルで直接機能するため、この特定のタスクに対する注意ベースの方法の可能性も強調しています。これらの有望な結果は、そのような認識システムを使用して、短期的にオペレーターを支援したり、将来的にログピッキング操作を完全に自動化したりできることを示しています。

Wood logs picking is a challenging task to automate. Indeed, logs usually come in cluttered configurations, randomly orientated and overlapping. Recent work on log picking automation usually assume that the logs' pose is known, with little consideration given to the actual perception problem. In this paper, we squarely address the latter, using a data-driven approach. First, we introduce a novel dataset, named TimberSeg 1.0, that is densely annotated, i.e., that includes both bounding boxes and pixel-level mask annotations for logs. This dataset comprises 220 images with 2500 individually segmented logs. Using our dataset, we then compare three neural network architectures on the task of individual logs detection and segmentation; two region-based methods and one attention-based method. Unsurprisingly, our results show that axis-aligned proposals, failing to take into account the directional nature of logs, underperform with 19.03 mAP. A rotation-aware proposal method significantly improve results to 31.83 mAP. More interestingly, a Transformer-based approach, without any inductive bias on rotations, outperformed the two others, achieving a mAP of 57.53 on our dataset. Our use case demonstrates the limitations of region-based approaches for cluttered, elongated objects. It also highlights the potential of attention-based methods on this specific task, as they work directly at the pixel-level. These encouraging results indicate that such a perception system could be used to assist the operators on the short-term, or to fully automate log picking operations in the future.

updated: Tue Oct 18 2022 21:57:32 GMT+0000 (UTC)

published: Thu Mar 03 2022 18:29:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト