Learning useful representations for shifting tasks and distributions

Jianyu Zhang; Léon Bottou

タスクと分布をシフトするための有用な表現の学習

表現を学習するための支配的なアプローチ (単一のトレーニング分布の予想コストを最適化することの副作用として) は、複数の分布を扱っている場合でも良いアプローチであり続けますか?私たちの論文は、そのようなシナリオは、単一の最適化エピソードで得られるものよりも豊富な表現によってより適切に提供されるというものです.この論文は、単純な理論的議論と、明らかに素朴なアンサンブル手法を利用した実験でサポートされています。つまり、同じデータ、モデル、アルゴリズム、およびハイパーパラメーターを使用して、複数のトレーニングエピソードから取得した表現を連結しますが、異なるランダムシードを使用します。これらの個別にトレーニングされたネットワークは、同様に機能します。しかし、新しい分布を含む多くのシナリオでは、連結された表現は、1 回のトレーニング実行でトレーニングされた同等のサイズのネットワークよりも大幅に優れたパフォーマンスを発揮します。これは、複数のトレーニングエピソードによって構築された表現が実際には異なることを証明しています。それらの連結は、トレーニング分布の下でのトレーニングタスクに関する追加情報をほとんど伝達しませんが、タスクまたは分布が変更されると、実質的により有益になります。一方、最適化プロセスには、トレーニングのパフォーマンスを段階的に改善しない機能を蓄積する理由がないため、単一のトレーニングエピソードでこのような冗長な表現が生成される可能性は低いです。

Does the dominant approach to learn representations (as a side effect of optimizing an expected cost for a single training distribution) remain a good approach when we are dealing with multiple distributions? Our thesis is that such scenarios are better served by representations that are richer than those obtained with a single optimization episode. We support this thesis with simple theoretical arguments and with experiments utilizing an apparently na\ive ensembling technique: concatenating the representations obtained from multiple training episodes using the same data, model, algorithm, and hyper-parameters, but different random seeds. These independently trained networks perform similarly. Yet, in a number of scenarios involving new distributions, the concatenated representation performs substantially better than an equivalently sized network trained with a single training run. This proves that the representations constructed by multiple training episodes are in fact different. Although their concatenation carries little additional information about the training task under the training distribution, it becomes substantially more informative when tasks or distributions change. Meanwhile, a single training episode is unlikely to yield such a redundant representation because the optimization process has no reason to accumulate features that do not incrementally improve the training performance.

updated: Tue Jan 31 2023 16:00:44 GMT+0000 (UTC)

published: Wed Dec 14 2022 17:17:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト