Learning useful representations for shifting tasks and distributions

Jianyu Zhang; Léon Bottou

タスクと分布をシフトするための有用な表現の学習

表現を学習するための支配的なアプローチ (単一のトレーニング分布の予想コストを最適化することの副作用として) は、複数の分布を扱っている場合でも優れたアプローチであり続けますか?私たちの主張は、そのようなシナリオは、単一の最適化エピソードで得られたものよりも「より豊かな」表現によってより適切に提供されるということです.これは、明らかにナイーブなアンサンブル手法で得られた経験的結果のコレクションによってサポートされています。つまり、同じデータ、モデル、アルゴリズム、およびハイパーパラメーターを使用して、複数のトレーニングエピソードで得られた表現を連結しますが、ランダムシードは異なります。これらの個別にトレーニングされたネットワークは、同様に機能します。しかし、新しい分布を含む多くのシナリオでは、連結された表現は、ゼロからトレーニングされた同等のサイズのネットワークよりも大幅に優れたパフォーマンスを発揮します。これは、複数のトレーニングエピソードによって構築された表現が実際には異なることを証明しています。それらの連結は、トレーニング分布の下でのトレーニングタスクに関する追加情報をほとんど伝達しませんが、タスクまたは分布が変更されると、実質的により有益になります。一方、最適化プロセスには、トレーニングのパフォーマンスを段階的に改善しない機能を蓄積する理由がないため、単一のトレーニングエピソードでこのような冗長な表現が生成される可能性は低いです。

Does the dominant approach to learn representations (as a side effect of optimizing an expected cost for a single training distribution) remain a good approach when we are dealing with multiple distributions. Our thesis is that such scenarios are better served by representations that are "richer" than those obtained with a single optimization episode. This is supported by a collection of empirical results obtained with an apparently naïve ensembling technique: concatenating the representations obtained with multiple training episodes using the same data, model, algorithm, and hyper-parameters, but different random seeds. These independently trained networks perform similarly. Yet, in a number of scenarios involving new distributions, the concatenated representation performs substantially better than an equivalently sized network trained from scratch. This proves that the representations constructed by multiple training episodes are in fact different. Although their concatenation carries little additional information about the training task under the training distribution, it becomes substantially more informative when tasks or distributions change. Meanwhile, a single training episode is unlikely to yield such a redundant representation because the optimization process has no reason to accumulate features that do not incrementally improve the training performance.

updated: Wed Dec 14 2022 17:17:10 GMT+0000 (UTC)

published: Wed Dec 14 2022 17:17:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト