Reproducing BowNet: Learning Representations by Predicting Bags of Visual Words

Harry Nguyen; Stone Yun; Hisham Mohammad

BowNetの再現：視覚的な単語の袋を予測することによる表現の学習

この作品は、GidarisらによるCVPR2020論文の結果を再現することを目的としています。自己監視学習（SSL）は、ラベルのないデータセットを使用して画像の特徴表現を学習するために使用されます。この作品は、堅牢で深い表現を学習するための自己監視学習ターゲットとして、bag-of-words（BoW）の深い特徴記述子を使用することを提案しています。 BowNetは、入力として画像の摂動バージョンが提示されたときに、参照画像の視覚的な単語（つまり、深いBoW記述子）のヒストグラムを再構築するようにトレーニングされています。したがって、この方法は、数ショットのタスクまたは監視されたダウンストリームタスクに役立つ可能性のある摂動不変でコンテキスト認識の画像機能を学習することを目的としています。この論文では、著者は、BowNetを、畳み込み特徴抽出器Φ（∙）と、画像からBoW特徴を予測するようにトレーニングされたDense-softmax層Ω（∙）で構成されるネットワークとして説明しています。 BoWトレーニングの後、Φの機能はダウンストリームタスクで使用されます。この課題のために、元の論文で報告されたCIFAR-100の精度の向上を再現できるネットワークを構築してトレーニングしようとしていました。しかし、著者が述べたものに匹敵する精度の向上を再現することはできませんでした。これはさまざまな要因による可能性があり、時間の制約が主なボトルネックであると考えています。

This work aims to reproduce results from the CVPR 2020 paper by Gidaris et al. Self-supervised learning (SSL) is used to learn feature representations of an image using an unlabeled dataset. This work proposes to use bag-of-words (BoW) deep feature descriptors as a self-supervised learning target to learn robust, deep representations. BowNet is trained to reconstruct the histogram of visual words (ie. the deep BoW descriptor) of a reference image when presented a perturbed version of the image as input. Thus, this method aims to learn perturbation-invariant and context-aware image features that can be useful for few-shot tasks or supervised downstream tasks. In the paper, the author describes BowNet as a network consisting of a convolutional feature extractor Φ(∙) and a Dense-softmax layer Ω(∙) trained to predict BoW features from images. After BoW training, the features of Φ are used in downstream tasks. For this challenge we were trying to build and train a network that could reproduce the CIFAR-100 accuracy improvements reported in the original paper. However, we were unsuccessful in reproducing an accuracy improvement comparable to what the authors mentioned. This could be for a variety of factors and we believe that time constraints were the primary bottleneck.

updated: Fri Jan 14 2022 19:55:43 GMT+0000 (UTC)

published: Mon Jan 10 2022 07:00:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト