Scalable Neural Data Server: A Data Recommender for Transfer Learning

Tianshi Cao; Sasha Doubov; David Acuna; Sanja Fidler

スケーラブルなニューラルデータサーバー：転移学習のためのデータ推奨

開業医のターゲットドメインに大規模なラベル付きデータがないことは、実際に機械学習アルゴリズムを適用する際のボトルネックになる可能性があります。転送学習は、追加のデータを活用してダウンストリームのパフォーマンスを向上させるための一般的な戦略ですが、転送元として最も関連性の高いデータを見つけるのは難しい場合があります。この問題に対処するために、特定のダウンストリームタスクに関連するデータを推奨する検索エンジンであるNeural Data Server（NDS）が以前に提案されています。 NDSは、データソースについてトレーニングを受けた専門家の混合を使用して、各ソースとダウンストリームタスク間の類似性を推定します。したがって、各ユーザーの計算コストは、ソースの数とともに増加します。これらの問題に対処するために、スケーラブルニューラルデータサーバー（SNDS）を提案します。これは、理論的には数千のデータセットにインデックスを付けて、関連するMLデータをエンドユーザーに提供できる大規模な検索エンジンです。 SNDSは、初期化中に中間データセットに関する専門家の混合をトレーニングし、中間データセットへの近接性によってデータソースとダウンストリームタスクの両方を表します。そのため、新しいデータセットがサーバーに追加されても、SNDSユーザーが負担する計算コストは固定されたままです。多数の実世界のタスクでSNDSを検証し、SNDSが推奨するデータがベースラインよりもダウンストリームタスクのパフォーマンスを向上させることを発見しました。また、自然な画像設定の外部で転送する関連データを選択する機能を示すことにより、SNDSのスケーラビリティを示します。

Absence of large-scale labeled data in the practitioner's target domain can be a bottleneck to applying machine learning algorithms in practice. Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance, but finding the most relevant data to transfer from can be challenging. Neural Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem. NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task. Thus, the computational cost to each user grows with the number of sources. To address these issues, we propose Scalable Neural Data Server (SNDS), a large-scale search engine that can theoretically index thousands of datasets to serve relevant ML data to end users. SNDS trains the mixture of experts on intermediary datasets during initialization, and represents both data sources and downstream tasks by their proximity to the intermediary datasets. As such, computational cost incurred by SNDS users remains fixed as new datasets are added to the server. We validate SNDS on a plethora of real world tasks and find that data recommended by SNDS improves downstream task performance over baselines. We also demonstrate the scalability of SNDS by showing its ability to select relevant data for transfer outside of the natural image setting.

updated: Sun Jun 19 2022 12:07:32 GMT+0000 (UTC)

published: Sun Jun 19 2022 12:07:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト