An Unsupervised Domain Adaptation Scheme for Single-Stage Artwork Recognition in Cultural Sites

Giovanni Pasqualino; Antonino Furnari; Giovanni Signorello; Giovanni Maria Farinella

文化的サイトにおける単一段階のアートワーク認識のための教師なしドメイン適応スキーム

ユーザーの視点（ファーストパーソンビジョン）から取得した画像を使用して文化サイトのアートワークを認識することで、訪問者とサイト管理者の両方にとって興味深いアプリケーションを構築できます。ただし、完全に監視された設定で動作する現在のオブジェクト検出アルゴリズムは、大量のラベル付きデータを使用してトレーニングする必要があります。このデータの収集には、良好なパフォーマンスを実現するために多くの時間と高コストが必要です。文化的サイトの3Dモデルから生成された合成データを使用してアルゴリズムをトレーニングすると、これらのコストを削減できます。一方、これらのモデルを実際の画像でテストすると、実際の画像と合成画像の違いにより、パフォーマンスの大幅な低下が見られます。この研究では、文化的サイトでのオブジェクト検出のための教師なしドメイン適応の問題を検討します。この問題に対処するために、16の異なるアートワークの合成画像と実画像の両方を含む新しいデータセットを作成しました。したがって、1ステージおよび2ステージのオブジェクト検出器、画像から画像への変換、および特徴の位置合わせに基づいて、さまざまなドメイン適応手法を調査しました。単一ステージの検出器は、検討対象の設定でのドメインシフトに対してより堅牢であるという観察に基づいて、RetinaNetと機能アラインメントに基づいてDA-RetinaNetと呼ばれる新しい方法を提案しました。提案されたアプローチは、提案されたデータセットと都市の景観で比較された方法よりも良い結果を達成します。この分野の研究をサポートするために、次のリンクhttps://iplab.dmi.unict.it/EGO-CH-OBJ-UDA/でデータセットをリリースし、https：//github.com/で提案されたアーキテクチャのコードをリリースします。 fpv-iplab / DA-RetinaNet。

Recognizing artworks in a cultural site using images acquired from the user's point of view (First Person Vision) allows to build interesting applications for both the visitors and the site managers. However, current object detection algorithms working in fully supervised settings need to be trained with large quantities of labeled data, whose collection requires a lot of times and high costs in order to achieve good performance. Using synthetic data generated from the 3D model of the cultural site to train the algorithms can reduce these costs. On the other hand, when these models are tested with real images, a significant drop in performance is observed due to the differences between real and synthetic images. In this study we consider the problem of Unsupervised Domain Adaptation for object detection in cultural sites. To address this problem, we created a new dataset containing both synthetic and real images of 16 different artworks. We hence investigated different domain adaptation techniques based on one-stage and two-stage object detector, image-to-image translation and feature alignment. Based on the observation that single-stage detectors are more robust to the domain shift in the considered settings, we proposed a new method which builds on RetinaNet and feature alignment that we called DA-RetinaNet. The proposed approach achieves better results than compared methods on the proposed dataset and on Cityscapes. To support research in this field we release the dataset at the following link https://iplab.dmi.unict.it/EGO-CH-OBJ-UDA/ and the code of the proposed architecture at https://github.com/fpv-iplab/DA-RetinaNet.

updated: Mon Dec 21 2020 20:37:19 GMT+0000 (UTC)

published: Tue Aug 04 2020 23:51:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト