Automatic Test Suite Generation for Key-points Detection DNNs Using Many-Objective Search

Fitash Ul Haq; Donghwan Shin; Lionel C. Briand; Thomas Stifter; Jun Wang

多目的検索を使用したキーポイント検出DNNの自動テストスイート生成

画像内のキーポイント（顔のキーポイントや指のキーポイントなど）の位置を自動的に検出することは、自動運転システムでのドライバーの視線検出や眠気検出など、多くのアプリケーションで不可欠な問題です。ディープニューラルネットワーク（DNN）の最近の進歩に伴い、キーポイント検出DNN（KP-DNN）がその目的でますます採用されています。それにもかかわらず、KP-DNNは多くの独立したキーポイントを同時に予測するため、KP-DNNのテストと検証は依然として困難な問題であり、個々のキーポイントはターゲットアプリケーションで重要になる可能性があり、画像は大きく異なる可能性があります多くの要因に従って対処します。この論文では、多目的検索を使用してKP-DNNのテストデータを自動的に生成するアプローチを紹介します。産業用自動車アプリケーション用に開発された顔のキーポイント検出DNNに焦点を当てた実験では、私たちのアプローチがテストスイートを生成して、平均してすべてのキーポイントの93％以上を大幅に誤予測できることを示しています。比較すると、ランダム検索ベースのテストデータ生成は、それらの41％を大幅に誤予測するだけです。ただし、これらの誤予測の多くは回避できないため、失敗と見なすべきではありません。また、テストスイートの生成に合わせて調整された、最先端の多目的検索アルゴリズムとそのバリアントを経験的に比較します。さらに、画像の特性（頭の姿勢や肌の色など）に基づいて、深刻な予測ミスにつながる特定の条件を学習する方法を調査し、実証します。このような条件は、リスク分析またはDNN再トレーニングの基礎として機能します。

Automatically detecting the positions of key-points (e.g., facial key-points or finger key-points) in an image is an essential problem in many applications, such as driver's gaze detection and drowsiness detection in automated driving systems. With the recent advances of Deep Neural Networks (DNNs), Key-Points detection DNNs (KP-DNNs) have been increasingly employed for that purpose. Nevertheless, KP-DNN testing and validation have remained a challenging problem because KP-DNNs predict many independent key-points at the same time -- where each individual key-point may be critical in the targeted application -- and images can vary a great deal according to many factors. In this paper, we present an approach to automatically generate test data for KP-DNNs using many-objective search. In our experiments, focused on facial key-points detection DNNs developed for an industrial automotive application, we show that our approach can generate test suites to severely mispredict, on average, more than 93% of all key-points. In comparison, random search-based test data generation can only severely mispredict 41% of them. Many of these mispredictions, however, are not avoidable and should not therefore be considered failures. We also empirically compare state-of-the-art, many-objective search algorithms and their variants, tailored for test suite generation. Furthermore, we investigate and demonstrate how to learn specific conditions, based on image characteristics (e.g., head posture and skin color), that lead to severe mispredictions. Such conditions serve as a basis for risk analysis or DNN retraining.

updated: Fri Dec 11 2020 17:28:03 GMT+0000 (UTC)

published: Fri Dec 11 2020 17:28:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト