A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

Bo-Kyeong Kim; Jaemin Kang; Daeun Seo; Hancheol Park; Shinkook Choi; Hyungshin Kim; Sungsu Lim

効率的な音声駆動型トーキングフェイス生成のための統合圧縮フレームワーク

バーチャルヒューマンは、エンターテイメントや電子商取引など、さまざまな業界で大きな注目を集めています。コア技術として、ターゲットの音声と顔のアイデンティティからフォトリアリスティックな顔フレームを合成することは、敵対的生成ネットワークで活発に研究されています。最新の話し顔生成モデルの顕著な結果にもかかわらず、それらはしばしば高い計算負荷を伴うため、効率的な展開が制限されます。この研究は、音声駆動の話し顔合成のための軽量モデルを開発することを目的としています。残りのブロックを削除し、人気のある話し顔ジェネレーターである Wav2Lip からチャネル幅を減らすことにより、コンパクトなジェネレーターを構築します。また、敵対的学習なしで小容量ジェネレーターを安定的かつ効果的にトレーニングするための知識蒸留スキームも提示します。元のモデルのパフォーマンスを維持しながら、パラメーターと MAC の数を 28 分の 1 に減らします。さらに、ジェネレーター全体を INT8 精度に変換する際の深刻なパフォーマンス低下を軽減するために、量子化に敏感なレイヤーに FP16 を使用し、それ以外のレイヤーに INT8 を使用する選択的量子化方法を採用しています。この混合精度を使用して、エッジ GPU で最大 19 倍の高速化を実現し、生成品質を著しく損なうことはありません。

Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limit their efficient deployment. This study aims to develop a lightweight model for speech-driven talking-face synthesis. We build a compact generator by removing the residual blocks and reducing the channel width from Wav2Lip, a popular talking-face generator. We also present a knowledge distillation scheme to stably yet effectively train the small-capacity generator without adversarial learning. We reduce the number of parameters and MACs by 28× while retaining the performance of the original model. Moreover, to alleviate a severe performance drop when converting the whole generator to INT8 precision, we adopt a selective quantization method that uses FP16 for the quantization-sensitive layers and INT8 for the other layers. Using this mixed precision, we achieve up to a 19× speedup on edge GPUs without noticeably compromising the generation quality.

updated: Sun Apr 02 2023 06:56:44 GMT+0000 (UTC)

published: Sun Apr 02 2023 06:56:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト