Fractal Pyramid Networks

Zhiqiang Deng; Huimin Yu; Yangqi Long

フラクタルピラミッドネットワーク

広く使用されているエンコーダ-デコーダ構造の代替として、ピクセル単位の予測タスク用の新しいネットワークアーキテクチャであるフラクタルピラミッドネットワーク（PFN）を提案します。エンコーダーデコーダー構造では、入力は、セマンティックな大チャネル機能を取得しようとするエンコード/デコードパイプラインによって処理されます。それとは異なり、提案されたPFNは複数の情報処理経路を保持し、情報を複数の個別の小チャネル機能にエンコードします。自己監視単眼深度推定のタスクでは、ImageNetが事前にトレーニングされていなくても、モデルは、はるかに少ないパラメーターでKITTIデータセットの最先端の方法と競合またはそれを上回ることができます。さらに、予測の視覚的品質が大幅に向上します。セマンティックセグメンテーションの実験は、PFNを他のピクセル単位の予測タスクに適用できるという証拠を提供し、モデルがよりグローバルな構造情報を取得できることを示しています。

We propose a new network architecture, the Fractal Pyramid Networks (PFNs) for pixel-wise prediction tasks as an alternative to the widely used encoder-decoder structure. In the encoder-decoder structure, the input is processed by an encoding-decoding pipeline that tries to get a semantic large-channel feature. Different from that, our proposed PFNs hold multiple information processing pathways and encode the information to multiple separate small-channel features. On the task of self-supervised monocular depth estimation, even without ImageNet pretrained, our models can compete or outperform the state-of-the-art methods on the KITTI dataset with much fewer parameters. Moreover, the visual quality of the prediction is significantly improved. The experiment of semantic segmentation provides evidence that the PFNs can be applied to other pixel-wise prediction tasks, and demonstrates that our models can catch more global structure information.

updated: Mon Jun 28 2021 13:15:30 GMT+0000 (UTC)

published: Mon Jun 28 2021 13:15:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト