Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

Haechan Mark Bong; Rongge Zhang; Ricardo de Azambuja; Giovanni Beltrame

この研究は、都市飛行ロボットの基礎となるステップである安全な着陸を目標としています。私たちの注意は、安全な着陸の認識スタックの最も重要な側面であると考えられるもの、つまりセグメンテーションに向けられています。我々は、オープンボキャブラリーの画像セグメンテーションの機能を利用して視覚サーボを採用する、合理化されたリアクティブ UAV システムを紹介します。このアプローチは、オープンボキャブラリー手法のおかげで、最小限の調整でさまざまなシナリオに適応でき、内部モデルを改良するための大規模なデータ蓄積の必要性を回避できます。地方自治体によって課された制限を考慮して、当社は高度 100 メートルからの活動に主に焦点を当てています。これまでの多くの作品では、小型ステレオカメラの機能に合わせて最大 30 メートルの高度を扱ってきたため、この選択は意図的です。したがって、残りの 20 メートルは従来の 3D 経路計画手法を使用してナビゲーションすることになります。単眼カメラと画像セグメンテーションを利用することで、我々の研究結果は、高度 20 メートルほどの低空でも着陸操作を成功裏に実行できるシステムの能力を実証しました。ただし、このアプローチは、ビデオストリーム内のフレーム間のセグメンテーションにおける断続的、場合によっては突然の変動に対して脆弱です。この課題に対処するために、ダイナミックフォーカスと呼ばれるもの、つまり現在の着陸段階に応じて自己調整するマスキングメカニズムを導入することで、画像セグメンテーションの出力を強化します。この動的なフォーカスにより、地面に投影されたドローンの安全半径を超える領域を避けるように制御システムが誘導され、変動の問題が軽減されます。この補足レイヤーの実装により、私たちの実験では、グローバルセグメンテーションと比較して、着陸成功率がほぼ 10 倍向上しました。すべてのソースコードはオープンソースであり、オンラインで入手できます (github.com/MISTLab/DOVESEI)。

This work targets what we consider to be the foundational step for urban airborne robots, a safe landing. Our attention is directed toward what we deem the most crucial aspect of the safe landing perception stack: segmentation. We present a streamlined reactive UAV system that employs visual servoing by harnessing the capabilities of open vocabulary image segmentation. This approach can adapt to various scenarios with minimal adjustments, bypassing the necessity for extensive data accumulation for refining internal models, thanks to its open vocabulary methodology. Given the limitations imposed by local authorities, our primary focus centers on operations originating from altitudes of 100 meters. This choice is deliberate, as numerous preceding works have dealt with altitudes up to 30 meters, aligning with the capabilities of small stereo cameras. Consequently, we leave the remaining 20m to be navigated using conventional 3D path planning methods. Utilizing monocular cameras and image segmentation, our findings demonstrate the system's capability to successfully execute landing maneuvers at altitudes as low as 20 meters. However, this approach is vulnerable to intermittent and occasionally abrupt fluctuations in the segmentation between frames in a video stream. To address this challenge, we enhance the image segmentation output by introducing what we call a dynamic focus: a masking mechanism that self adjusts according to the current landing stage. This dynamic focus guides the control system to avoid regions beyond the drone's safety radius projected onto the ground, thus mitigating the problems with fluctuations. Through the implementation of this supplementary layer, our experiments have reached improvements in the landing success rate of almost tenfold when compared to global segmentation. All the source code is open source and available online (github.com/MISTLab/DOVESEI).

updated: Fri May 03 2024 19:05:18 GMT+0000 (UTC)

published: Tue Aug 22 2023 14:36:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト