Generating Realistic Training Images Based on Tonality-Alignment Generative Adversarial Networks for Hand Pose Estimation

Liangjian Chen; Shih-Yao Lin; Yusheng Xie; Hui Tang; Yufan Xue; Xiaohui Xie; Yen-Yu Lin; Wei Fan

手のポーズ推定のための調性-整列生成的敵対的ネットワークに基づく現実的なトレーニング画像の生成

単眼RGB画像からの手のポーズの推定は重要ですが、やりがいのある作業です。そのパフォーマンスに影響を与える主な要因は、正確なハンドキーポイント注釈を備えた十分に大きなトレーニングデータセットがないことです。本研究では、リアルな手のポーズを生成する効果的な方法を提案することでこの問題を回避し、生成された手のポーズをトレーニングデータとして利用することで、手のポーズを推定するための最先端のアルゴリズムを大幅に改善できることを示します。具体的には、まず拡張現実（AR）シミュレーターを採用して、正確なハンドキーポイントラベルを使用して手のポーズを合成します。合成の手のポーズには正確なジョイントラベルが付いているため、手動で注釈を付ける必要はありませんが、不自然に見え、理想的なトレーニングデータではありません。よりリアルな手のポーズを作成するために、合成の手のポーズを腕や袖などの実際の背景とブレンドすることを提案します。この目的のために、合成の手のポーズと実際の背景の間の色調と色の分布を調整し、高品質の手のポーズを生成できる、色調調整生成敵対的ネットワーク（TAGAN）を開発します。 RHP、STB、CMU-PSの手のポーズデータセットを含む3つのベンチマークでTAGANを評価します。合成されたポーズの助けを借りて、私たちの方法は、2Dと3Dの両方の手のポーズ推定において最先端のものに対して有利に機能します。

Hand pose estimation from a monocular RGB image is an important but challenging task. The main factor affecting its performance is the lack of a sufficiently large training dataset with accurate hand-keypoint annotations. In this work, we circumvent this problem by proposing an effective method for generating realistic hand poses and show that state-of-the-art algorithms for hand pose estimation can be greatly improved by utilizing the generated hand poses as training data. Specifically, we first adopt an augmented reality (AR) simulator to synthesize hand poses with accurate hand-keypoint labels. Although the synthetic hand poses come with precise joint labels, eliminating the need of manual annotations, they look unnatural and are not the ideal training data. To produce more realistic hand poses, we propose to blend a synthetic hand pose with a real background, such as arms and sleeves. To this end, we develop tonality-alignment generative adversarial networks (TAGANs), which align the tonality and color distributions between synthetic hand poses and real backgrounds, and can generate high quality hand poses. We evaluate TAGAN on three benchmarks, including the RHP, STB, and CMU-PS hand pose datasets. With the aid of the synthesized poses, our method performs favorably against the state-of-the-arts in both 2D and 3D hand pose estimations.

updated: Sun Dec 13 2020 02:09:40 GMT+0000 (UTC)

published: Sun Nov 25 2018 01:18:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト