ParamCrop: Parametric Cubic Cropping for Video Contrastive Learning

Zhiwu Qing; Ziyuan Huang; Shiwei Zhang; Mingqian Tang; Changxin Gao; Marcelo H. Ang Jr; Rong Jin; Nong Sang

ParamCrop：ビデオ対照学習のためのパラメトリックキュービッククロッピング

対照学習の中心的な考え方は、異なるインスタンスを区別し、同じインスタンスからの異なるビューに同じ表現を共有させることです。些細な解決策を回避するために、拡張はさまざまなビューを生成する上で重要な役割を果たします。その中で、ランダムなトリミングは、モデルが一般化された堅牢な表現を学習するのに効果的であることが示されています。一般的に使用されるランダムクロップ操作は、トレーニングプロセスに沿って2つのビュー間の差の分布を変更せずに維持します。この作業では、トレーニングプロセスに沿って2つの拡張ビュー間の視差を適応的に制御することで、学習した表現の品質が向上することを示します。具体的には、ビデオ対照学習用のパラメトリック3次トリミング操作ParamCropを紹介します。これは、微分可能な3Dアフィン変換によって3D3次を自動的にトリミングします。 ParamCropは、敵対的な目的を使用してビデオバックボーンと同時にトレーニングされ、データから最適なトリミング戦略を学習します。視覚化は、ParamCropが2つの拡張ビュー間の中心距離とIoUを適応的に制御し、トレーニングプロセスに沿って学習された視差の変化が、強力な表現を学習するのに有益であることを示しています。広範なアブレーション研究は、複数の対照的な学習フレームワークとビデオバックボーンに対する提案されたParamCropの有効性を示しています。コードとモデルが利用可能になります。

The central idea of contrastive learning is to discriminate between different instances and force different views from the same instance to share the same representation. To avoid trivial solutions, augmentation plays an important role in generating different views, among which random cropping is shown to be effective for the model to learn a generalized and robust representation. Commonly used random crop operation keeps the distribution of the difference between two views unchanged along the training process. In this work, we show that adaptively controlling the disparity between two augmented views along the training process enhances the quality of the learned representation. Specifically, we present a parametric cubic cropping operation, ParamCrop, for video contrastive learning, which automatically crops a 3D cubic by differentiable 3D affine transformations. ParamCrop is trained simultaneously with the video backbone using an adversarial objective and learns an optimal cropping strategy from the data. The visualizations show that ParamCrop adaptively controls the center distance and the IoU between two augmented views, and the learned change in the disparity along the training process is beneficial to learning a strong representation. Extensive ablation studies demonstrate the effectiveness of the proposed ParamCrop on multiple contrastive learning frameworks and video backbones. Codes and models will be available.

updated: Tue Nov 23 2021 06:59:10 GMT+0000 (UTC)

published: Tue Aug 24 2021 03:18:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト