Auto-CARD: Efficient and Robust Codec Avatar Driving for Real-time Mobile Telepresence

Yonggan Fu; Yuecheng Li; Chenghui Li; Jason Saragih; Peizhao Zhang; Xiaoliang Dai; Yingyan Lin

Auto-CARD: リアルタイムモバイルテレプレゼンスのための効率的で堅牢なコーデックアバター駆動

AR/VR でのテレプレゼンス用のリアルタイムで堅牢なフォトリアリスティックアバターは、没入型のフォトリアリスティックテレプレゼンスを実現するために非常に望まれています。ただし、重要なボトルネックが 1 つあります。ヘッドセットに取り付けられたカメラからキャプチャされた顔の表情を、アバターの人間の外観のリアリズムに匹敵する品質レベルで正確に推測するには、かなりの計算コストが必要です。この目的のために、Auto-CARD と呼ばれるフレームワークを提案します。このフレームワークは、デバイス上のコンピューティングリソースのみを排他的に使用する場合に、コーデックアバターのリアルタイムで堅牢な駆動を初めて可能にします。これは、冗長性の 2 つのソースを最小限に抑えることによって実現されます。まず、AR/VR でアバターをエンコードするための AVE-NAS と呼ばれる専用のニューラルアーキテクチャ検索手法を開発します。これにより、極端な表情が存在する場合の検索アーキテクチャの堅牢性と、急速に進化する AR/VR ヘッドセットでのハードウェアの使いやすさの両方が明示的に強化されます。次に、連続レンダリング中に連続してキャプチャされた画像の一時的な冗長性を活用し、冗長なフレームの計算をスキップする LATEX と呼ばれるメカニズムを開発します。具体的には、最初にアバターデコーダーによって導出された潜在空間の線形性から機会を特定し、次に冗長フレームに対して適応潜在外挿を実行することを提案します。評価のために、リアルタイムコーデックアバター駆動設定で Auto-CARD フレームワークの有効性を実証します。そこでは、Meta Quest 2 で 5.05 倍のスピードアップを達成しながら、現状と同等またはそれ以上のアニメーション品質を維持します。 -アートアバターエンコーダーのデザイン。

Real-time and robust photorealistic avatars for telepresence in AR/VR have been highly desired for enabling immersive photorealistic telepresence. However, there still exists one key bottleneck: the considerable computational expense needed to accurately infer facial expressions captured from headset-mounted cameras with a quality level that can match the realism of the avatar's human appearance. To this end, we propose a framework called Auto-CARD, which for the first time enables real-time and robust driving of Codec Avatars when exclusively using merely on-device computing resources. This is achieved by minimizing two sources of redundancy. First, we develop a dedicated neural architecture search technique called AVE-NAS for avatar encoding in AR/VR, which explicitly boosts both the searched architectures' robustness in the presence of extreme facial expressions and hardware friendliness on fast evolving AR/VR headsets. Second, we leverage the temporal redundancy in consecutively captured images during continuous rendering and develop a mechanism dubbed LATEX to skip the computation of redundant frames. Specifically, we first identify an opportunity from the linearity of the latent space derived by the avatar decoder and then propose to perform adaptive latent extrapolation for redundant frames. For evaluation, we demonstrate the efficacy of our Auto-CARD framework in real-time Codec Avatar driving settings, where we achieve a 5.05x speed-up on Meta Quest 2 while maintaining a comparable or even better animation quality than state-of-the-art avatar encoder designs.

updated: Mon Apr 24 2023 05:45:12 GMT+0000 (UTC)

published: Mon Apr 24 2023 05:45:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト