Differentiable Sensor Layouts for End-to-End Learning of Task-Specific Camera Parameters

Hendrik Sommerhoff; Shashank Agnihotri; Mohamed Saleh; Michael Moeller; Margret Keuper; Andreas Kolb

タスク固有のカメラパラメーターをエンドツーエンドで学習するための微分可能なセンサーレイアウト

ディープラーニングの成功は、ネットワークのすべてのパラメーターを特定のアプリケーションでエンドツーエンドの方法でトレーニングできることとしてよく説明されます。それでも、センサーのピクセルレイアウトを含む、カメラレベルでのいくつかの設計上の選択肢は、事前に定義され、固定されていると見なされます。高解像度の通常のピクセルレイアウトは、コンピュータービジョンとグラフィックスにおいて最も一般的なものであると見なされます。画像の領域も同様に重要です。いくつかの研究では、ハードウェアおよび画像処理における六角形または中心窩などの不均一なピクセルレイアウトが考慮されていますが、レイアウトはこれまでエンドツーエンドの学習パラダイムに統合されていません。この作業では、特定のタスクで特定のニューラルネットワークのパラメーターと連携して、イメージングセンサー上のピクセルのサイズと分布を最適化する、真にエンドツーエンドのトレーニング済みイメージングパイプラインを初めて紹介します。タスク固有の局所的なさまざまなピクセル解像度を可能にするセンサーレイアウトのパラメーター化のための分析的で微分可能なアプローチを導き出します。 2 つのピクセルレイアウトパラメータ化関数を提示します。通常のトポロジを保持する長方形と曲線のグリッド形状です。既存の高解像度画像を考慮してセンサーシミュレーションを近似するドロップインモジュールを提供し、既存の深層学習モデルと直接接続します。ネットワーク予測は、分類とセマンティックセグメンテーションという 2 つの異なるダウンストリームタスクの学習可能なピクセルレイアウトの恩恵を受けることを示します。

The success of deep learning is frequently described as the ability to train all parameters of a network on a specific application in an end-to-end fashion. Yet, several design choices on the camera level, including the pixel layout of the sensor, are considered as pre-defined and fixed, and high resolution, regular pixel layouts are considered to be the most generic ones in computer vision and graphics, treating all regions of an image as equally important. While several works have considered non-uniform, e.g. , hexagonal or foveated, pixel layouts in hardware and image processing, the layout has not been integrated into the end-to-end learning paradigm so far. In this work, we present the first truly end-to-end trained imaging pipeline that optimizes the size and distribution of pixels on the imaging sensor jointly with the parameters of a given neural network on a specific task. We derive an analytic, differentiable approach for the sensor layout parameterization that allows for task-specific, local varying pixel resolutions. We present two pixel layout parameterization functions: rectangular and curvilinear grid shapes that retain a regular topology. We provide a drop-in module that approximates sensor simulation given existing high-resolution images to directly connect our method with existing deep learning models. We show that network predictions benefit from learnable pixel layouts for two different downstream tasks, classification and semantic segmentation.

updated: Fri Apr 28 2023 10:28:09 GMT+0000 (UTC)

published: Fri Apr 28 2023 10:28:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト