CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

Hiroyuki Ootomo; Akira Naruse; Corey Nolet; Ray Wang; Tamas Feher; Yong Wang

CAGRA: GPU の高度な並列グラフ構築と近似最近傍検索

近似最近傍検索 (ANNS) は、情報検索やコンピュータービジョンから自然言語処理やレコメンダーシステムに至るまで、データマイニングと人工知能にわたるさまざまな分野で重要な役割を果たしています。近年、データ量が急増しており、徹底的な正確最近傍検索の計算コストは法外なことが多く、近似手法の採用が必要となっています。グラフベースのアプローチのバランスの取れたパフォーマンスとリコールは、最近 ANNS アルゴリズムで大きな注目を集めていますが、大規模な並列コンピューティングと汎用コンピューティングが広く使用されているにもかかわらず、GPU とマルチコアプロセッサのパワーの利用を検討した研究はほんのわずかです。。このギャップを埋めるために、新しい並列コンピューティングハードウェアベースの近接グラフと検索アルゴリズムを導入します。最新のハードウェアの高性能機能を活用することで、当社のアプローチは大幅な効率向上を実現します。特に、私たちの方法は、近接グラフの構築において既存の CPU および GPU ベースの方法を上回り、互換性のある精度を維持しながら、大規模バッチ検索と小規模バッチ検索の両方でより高いスループットを実証します。グラフの構築時間では、私たちの手法である CAGRA は、CPU SOTA 実装の 1 つである HNSW よりも 2.2 ～ 27 倍高速です。 90% ～ 95% のリコール範囲の大規模バッチクエリスループットでは、私たちの方法は HNSW よりも 33 ～ 77 倍高速で、GPU 用の SOTA 実装よりも 3.8 ～ 8.8 倍高速です。単一クエリの場合、私たちの方法は 95% の再現率で HNSW よりも 3.4 ～ 53 倍高速です。

Approximate Nearest Neighbor Search (ANNS) plays a critical role in various disciplines spanning data mining and artificial intelligence, from information retrieval and computer vision to natural language processing and recommender systems. Data volumes have soared in recent years and the computational cost of an exhaustive exact nearest neighbor search is often prohibitive, necessitating the adoption of approximate techniques. The balanced performance and recall of graph-based approaches have more recently garnered significant attention in ANNS algorithms, however, only a few studies have explored harnessing the power of GPUs and multi-core processors despite the widespread use of massively parallel and general-purpose computing. To bridge this gap, we introduce a novel parallel computing hardware-based proximity graph and search algorithm. By leveraging the high-performance capabilities of modern hardware, our approach achieves remarkable efficiency gains. In particular, our method surpasses existing CPU and GPU-based methods in constructing the proximity graph, demonstrating higher throughput in both large- and small-batch searches while maintaining compatible accuracy. In graph construction time, our method, CAGRA, is 2.2~27x faster than HNSW, which is one of the CPU SOTA implementations. In large-batch query throughput in the 90% to 95% recall range, our method is 33~77x faster than HNSW, and is 3.8~8.8x faster than the SOTA implementations for GPU. For a single query, our method is 3.4~53x faster than HNSW at 95% recall.

updated: Tue Jul 09 2024 02:41:11 GMT+0000 (UTC)

published: Tue Aug 29 2023 09:10:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト