Modeling Multiple Views via Implicitly Preserving Global Consistency and Local Complementarity

Jiangmeng Li; Wenwen Qiang; Changwen Zheng; Bing Su; Farid Razzak; Ji-Rong Wen; Hui Xiong

グローバルな一貫性とローカルな補完性を暗黙的に維持することによる複数のビューのモデル化

自己教師あり学習手法は、複数のビューをモデル化することにより、ラベルのないデータから暗黙の知識をマイニングするためによく使用されますが、複雑で一貫性のないコンテキストで効果的な表現学習を実行する方法は不明です。この目的のために、複数のビューからの表現を包括的に学習するために、厳密なグローバルビュー間一貫性とローカルクロスビュー相補性を保持する正則化を利用する方法論、具体的には一貫性と相補性ネットワーク (CoCoNet) を提案します。グローバルな段階では、重要な知識はビュー間で暗黙的に共有されていると考えており、データからそのような知識を取得するようにエンコーダーを強化すると、学習した表現の識別可能性を向上させることができます。したがって、複数のビューのグローバルな一貫性を維持することで、共通の知識の獲得が保証されます。 CoCoNet は、一般化されたスライスワッサースタイン距離に基づく効率的な不一致メトリック測定を利用して、ビューの確率分布を調整します。最後に、ローカル段階で、クロスビュー識別知識を結合するヒューリスティック補完係数を提案し、エンコーダーがビュー単位の識別可能性だけでなくクロスビュー補完情報も学習するように導きます。理論的には、提案された CoCoNet の情報理論に基づく分析を提供します。経験的に、私たちのアプローチの改善点を調査するために、適切な実験的検証を行います。これは、CoCoNet が最先端の自己教師あり方法よりも大幅に優れていることを示しており、そのような暗黙の一貫性と補完性を維持する正則化により、潜在表現の識別可能性。

While self-supervised learning techniques are often used to mining implicit knowledge from unlabeled data via modeling multiple views, it is unclear how to perform effective representation learning in a complex and inconsistent context. To this end, we propose a methodology, specifically consistency and complementarity network (CoCoNet), which avails of strict global inter-view consistency and local cross-view complementarity preserving regularization to comprehensively learn representations from multiple views. On the global stage, we reckon that the crucial knowledge is implicitly shared among views, and enhancing the encoder to capture such knowledge from data can improve the discriminability of the learned representations. Hence, preserving the global consistency of multiple views ensures the acquisition of common knowledge. CoCoNet aligns the probabilistic distribution of views by utilizing an efficient discrepancy metric measurement based on the generalized sliced Wasserstein distance. Lastly on the local stage, we propose a heuristic complementarity-factor, which joints cross-view discriminative knowledge, and it guides the encoders to learn not only view-wise discriminability but also cross-view complementary information. Theoretically, we provide the information-theoretical-based analyses of our proposed CoCoNet. Empirically, to investigate the improvement gains of our approach, we conduct adequate experimental validations, which demonstrate that CoCoNet outperforms the state-of-the-art self-supervised methods by a significant margin proves that such implicit consistency and complementarity preserving regularization can enhance the discriminability of latent representations.

updated: Fri Sep 16 2022 09:24:00 GMT+0000 (UTC)

published: Fri Sep 16 2022 09:24:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト