360-MLC: Multi-view Layout Consistency for Self-training and Hyper-parameter Tuning

Bolivar Solarte; Chin-Hsuan Wu; Yueh-Cheng Liu; Yi-Hsuan Tsai; Min Sun

360-MLC: 自己トレーニングとハイパーパラメーター調整のためのマルチビューレイアウトの一貫性

360-MLC は、ラベルのない 360 画像のみを使用して単眼ルームレイアウトモデルを微調整するためのマルチビューレイアウトの一貫性に基づく自己トレーニング方法です。これは、グラウンドトゥルースアノテーションを使用せずに、事前トレーニング済みのモデルを新しいデータドメインに適応させる必要がある実用的なシナリオで役立ちます。私たちの単純だが効果的な仮定は、同じシーン内の複数のレイアウト推定は、カメラの位置に関係なく、一貫したジオメトリを定義する必要があるということです。この考えに基づいて、事前トレーニング済みのモデルを活用して、複数のカメラビューから推定されたレイアウト境界を 3D ワールド座標に投影します。次に、それらを球座標に再投影し、確率関数を構築します。そこから自己トレーニング用の疑似ラベルをサンプリングします。信頼できない疑似ラベルを処理するために、再投影された境界の分散を不確実性の値として評価し、トレーニング中に損失関数の各疑似ラベルに重みを付けます。さらに、グラウンドトゥルースアノテーションはトレーニング中もテスト中も利用できないため、シーンのジオメトリの一貫性を測定するための定量的メトリックとして、複数のレイアウト推定でエントロピー情報を活用し、ハイパーパラメータ調整のためのレイアウト推定量を評価できるようにします。、グラウンドトゥルースアノテーションなしのモデル選択を含みます。実験結果は、公開されている 3 つのソースデータセットから、同じシーンのマルチビューで構成される独自の新しくラベル付けされたデータセットにセルフトレーニングするときに、最先端の方法に対して当社のソリューションが良好なパフォーマンスを達成することを示しています。

We present 360-MLC, a self-training method based on multi-view layout consistency for finetuning monocular room-layout models using unlabeled 360-images only. This can be valuable in practical scenarios where a pre-trained model needs to be adapted to a new data domain without using any ground truth annotations. Our simple yet effective assumption is that multiple layout estimations in the same scene must define a consistent geometry regardless of their camera positions. Based on this idea, we leverage a pre-trained model to project estimated layout boundaries from several camera views into the 3D world coordinate. Then, we re-project them back to the spherical coordinate and build a probability function, from which we sample the pseudo-labels for self-training. To handle unconfident pseudo-labels, we evaluate the variance in the re-projected boundaries as an uncertainty value to weight each pseudo-label in our loss function during training. In addition, since ground truth annotations are not available during training nor in testing, we leverage the entropy information in multiple layout estimations as a quantitative metric to measure the geometry consistency of the scene, allowing us to evaluate any layout estimator for hyper-parameter tuning, including model selection without ground truth annotations. Experimental results show that our solution achieves favorable performance against state-of-the-art methods when self-training from three publicly available source datasets to a unique, newly labeled dataset consisting of multi-view of the same scenes.

updated: Mon Oct 24 2022 03:31:48 GMT+0000 (UTC)

published: Mon Oct 24 2022 03:31:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト