An Artificial Intelligence System for Combined Fruit Detection and Georeferencing, Using RTK-Based Perspective Projection in Drone Imagery

Angus Baird; Stefano Giani

ドローン画像でRTKベースの透視投影を使用した果物の検出と地理参照を組み合わせた人工知能システム

この作品は、Faster Region-Based Convolution Neural Network（Faster R-CNN）フレームワークに基づく人工知能（AI）システムを紹介します。このフレームワークは、巨大な商業果樹園の斜めの空中ドローン画像からリンゴを検出してカウントします。計算コストを削減するために、ネットワークの新しい前段階が、生の画像を個々の木のトリミングされた画像に前処理するように設計されています。透視投影モデルを使用して、これらに一意の地理空間識別子が割り当てられます。これは、リアルタイムキネマティック（RTK）データ、デジタル地形および表面モデル（DTMおよびDSM）、および内部および外部のカメラパラメーターを使用します。ただし、実験の大部分は、検出ネットワーク自体のハイパーパラメータの調整に焦点を合わせています。木の上にあるリンゴと地面にあるリンゴは、別々のクラスとして扱われます。 2つのクラスのサイズによって調整された平均平均精度（mAP）メトリックは、誤った結果を軽減するために考案されています。リンゴの大きさから、アンカーボックスのデザインは非常に重要です。そのため、Faster R-CNNの文献ではこれまで見られなかった、k-meansクラスタリング手法により、キャリブレーションされたmAPが最も大幅に改善されました。他の実験では、ボックス提案の最大数は225である必要があることが示されました。 0.001の初期学習率は、適応型RMSプロップオプティマイザーに最適です。 ResNet 101は、mAPと、程度は少ないが推論時間を考慮する場合の理想的な基本特徴抽出器です。最適なハイパーパラメータを統合すると、キャリブレーションされたmAPが0.7627のモデルになります。

This work presents an Artificial Intelligence (AI) system, based on the Faster Region-Based Convolution Neural Network (Faster R-CNN) framework, which detects and counts apples from oblique, aerial drone imagery of giant commercial orchards. To reduce computational cost, a novel precursory stage to the network is designed to preprocess raw imagery into cropped images of individual trees. Unique geospatial identifiers are allocated to these using the perspective projection model. This employs Real-Time Kinematic (RTK) data, Digital Terrain and Surface Models (DTM and DSM), as well as internal and external camera parameters. The bulk of experiments however focus on tuning hyperparameters in the detection network itself. Apples which are on trees and apples which are on the ground are treated as separate classes. A mean Average Precision (mAP) metric, calibrated by the size of the two classes, is devised to mitigate spurious results. Anchor box design is of key interest due to the scale of the apples. As such, a k-means clustering approach, never before seen in literature for Faster R-CNN, resulted in the most significant improvements to calibrated mAP. Other experiments showed that the maximum number of box proposals should be 225; the initial learning rate of 0.001 is best applied to the adaptive RMS Prop optimiser; and ResNet 101 is the ideal base feature extractor when considering mAP and, to a lesser extent, inference time. The amalgamation of the optimal hyperparameters leads to a model with a calibrated mAP of 0.7627.

updated: Fri Jan 01 2021 23:39:55 GMT+0000 (UTC)

published: Fri Jan 01 2021 23:39:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト