Using Language to Extend to Unseen Domains

Lisa Dunlap; Clara Mohri; Devin Guillory; Han Zhang; Trevor Darrell; Joseph E. Gonzalez; Aditi Raghunathan; Anja Rohrbach

言語を使用して目に見えない領域に拡張する

展開時にビジョンモデルが遭遇する可能性のあるすべてのドメインのトレーニングデータを収集するには、コストがかかります。代わりに、トレーニングドメイン (「鳥の写真」など) と、拡張したいがデータがないドメイン (「鳥の絵」など) を簡単に言語化することで、堅牢性を向上できることを検討します。共同画像と言語埋め込み空間を持つマルチモーダルモデルを使用して、LADS メソッドは、タスク関連情報を保持しながら、トレーニングドメインから各目に見えないテストドメインへの画像埋め込みの変換を学習します。目に見えないテストドメインからの画像を使用せずに、トレーニングドメインと目に見えないテストドメインの両方を含む拡張ドメインで、ドメイン適応とデータセットバイアスを対象とする 4 つのベンチマークのスイートで、LADS が標準の微調整とアンサンブルアプローチよりも優れていることを示します。

It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply verbalizing the training domain (e.g. "photos of birds") as well as domains we want to extend to but do not have data for (e.g. "paintings of birds") can improve robustness. Using a multimodal model with a joint image and language embedding space, our method LADS learns a transformation of the image embeddings from the training domain to each unseen test domain, while preserving task relevant information. Without using any images from the unseen test domain, we show that over the extended domain containing both training and unseen test domains, LADS outperforms standard fine-tuning and ensemble approaches over a suite of four benchmarks targeting domain adaptation and dataset bias.

updated: Sun Mar 26 2023 20:47:55 GMT+0000 (UTC)

published: Tue Oct 18 2022 01:14:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト