AptSim2Real: Approximately-Paired Sim-to-Real Image Translation

Charles Y Zhang; Ashish Shrivastava

AptSim2Real: 近似対の Sim-to-Real 画像変換

グラフィックステクノロジの進歩により、機械学習モデルのトレーニングにシミュレートされたデータの使用が増加しています。ただし、シミュレートされたデータは現実世界のデータとは異なることが多く、分布ギャップが生じ、現実世界のアプリケーションでシミュレーションデータでトレーニングされたモデルの有効性が低下する可能性があります。このギャップを軽減するために、sim-to-real ドメイン転送は、シミュレートされた画像を実際のデータにより一致するように変更し、モデルのトレーニングでシミュレーションデータを効果的に使用できるようにします。 Sim-to-real 転送は、画像変換方法を利用します。画像変換方法は、対と非対の画像から画像への変換の 2 つの主なカテゴリに分類されます。対になった画像の変換には完全なピクセルの一致が必要であり、シミュレーションと実世界のデータの間にピクセル単位の対応がないため、実際に適用することは困難です。対になっていない画像変換は、sim から real への変換により適していますが、複雑な自然シーンの学習は依然として困難です。これらの課題に対処するために、3 番目のカテゴリを提案します。これは、ソースイメージとターゲットイメージを正確にペアにする必要のない、シムからリアルへのほぼペアの変換です。私たちの近似ペア手法である AptSim2Real は、シミュレーターが照明、環境、構図の点で現実世界のシーンに大まかに似たシーンを生成できるという事実を利用しています。当社の新しいトレーニング戦略により、質的および量的に大幅な改善がもたらされ、最先端のペアのない画像変換方法と比較して、FID スコアが最大 24% 向上します。

Advancements in graphics technology has increased the use of simulated data for training machine learning models. However, the simulated data often differs from real-world data, creating a distribution gap that can decrease the efficacy of models trained on simulation data in real-world applications. To mitigate this gap, sim-to-real domain transfer modifies simulated images to better match real-world data, enabling the effective use of simulation data in model training. Sim-to-real transfer utilizes image translation methods, which are divided into two main categories: paired and unpaired image-to-image translation. Paired image translation requires a perfect pixel match, making it difficult to apply in practice due to the lack of pixel-wise correspondence between simulation and real-world data. Unpaired image translation, while more suitable for sim-to-real transfer, is still challenging to learn for complex natural scenes. To address these challenges, we propose a third category: approximately-paired sim-to-real translation, where the source and target images do not need to be exactly paired. Our approximately-paired method, AptSim2Real, exploits the fact that simulators can generate scenes loosely resembling real-world scenes in terms of lighting, environment, and composition. Our novel training strategy results in significant qualitative and quantitative improvements, with up to a 24% improvement in FID score compared to the state-of-the-art unpaired image-translation methods.

updated: Thu Mar 23 2023 04:32:57 GMT+0000 (UTC)

published: Thu Mar 09 2023 06:18:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト