Speed Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

Yu-Hui Chen; Raman Sarokin; Juhyun Lee; Jiuqiang Tang; Chuo-Ling Chang; Andrei Kulik; Matthias Grundmann

必要なのは速度だけ: GPU を考慮した最適化による大規模な拡散モデルのオンデバイスアクセラレーション

基礎モデルの急速な開発と適用は、人工知能の分野に革命をもたらしました。大規模な拡散モデルは、写真のようにリアルな画像を生成し、さまざまなタスクをサポートする能力で大きな注目を集めています。これらのモデルをデバイス上に展開すると、サーバーコストの削減、オフライン機能、ユーザープライバシーの向上などの利点が得られます。ただし、一般的な大規模な拡散モデルには 10 億を超えるパラメーターがあり、デバイスの計算リソースとメモリリソースが制限されているため、課題が生じます。 GPU を搭載したモバイルデバイスで、これまでに報告された最速の推論レイテンシ (Samsung S23 Ultra の 512x512 画像で 20 回の反復で、int8 量子化なしの Stable Diffusion 1.4 で 12 秒未満) を達成する大規模拡散モデルの一連の実装最適化を提示します。 .これらの機能強化により、ジェネレーティブ AI の適用範囲が広がり、さまざまなデバイスでの全体的なユーザーエクスペリエンスが向上します。

The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.

updated: Fri Jun 16 2023 17:04:09 GMT+0000 (UTC)

published: Fri Apr 21 2023 22:40:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト