Fast Autofocusing using Tiny Transformer Networks for Digital Holographic Microscopy

Stéphane Cuenat; Louis Andréoli; Antoine N. André; Patrick Sandoz; Guillaume J. Laurent; Raphaël Couturier; Maxime Jacquot

デジタルホログラフィック顕微鏡用のTinyTransformerネットワークを使用した高速オートフォーカス

デジタルホログラフィの数値波面バックプロパゲーション原理は、z軸に沿った機械的変位なしに、独自の拡張フォーカス機能を提供します。ただし、正しい焦点距離の決定は、重要で時間のかかる問題です。ディープラーニング（DL）ソリューションは、オートフォーカスを回帰問題としてキャストするために提案され、実験的ホログラムとシミュレートされたホログラムの両方でテストされます。単一波長デジタルホログラムは、92μmの軸方向範囲にわたって3Dで移動するパターン化されたターゲットから、10倍の顕微鏡対物レンズを備えたデジタルホログラフィック顕微鏡（DHM）によって記録されました。小さなVisionTransformer（TViT）、小さなVGG16（TVGG）、小さなSwin-Transfomer（TSwinT）などの小さなDLモデルが提案され、比較されています。提案された小さなネットワークは、元のバージョン（ViT / B16、VGG16、Swin-Transformer Tiny）や、LeNetやAlexNetなどのデジタルホログラフィーで使用される主要なニューラルネットワークと比較されます。実験は、予測された集束距離Z_R ^ Predが、15μmのDHM被写界深度と比較して平均1.2μmの精度で正確に推測されることを示しています。数値シミュレーションは、すべての小さなモデルが0.3μm未満の誤差でZ_R^Predを与えることを示しています。このような見通しにより、ライフサイエンスやマイクロロボティクスの3D顕微鏡などのアプリケーションで、コンピュータービジョンの位置検出の現在の機能が大幅に向上します。さらに、すべてのモデルがCPUで推論時間に達し、推論あたり25ミリ秒未満になります。オクルージョンに関しては、Transformerアーキテクチャに基づくTViTが最も堅牢です。

The numerical wavefront backpropagation principle of digital holography confers unique extended focus capabilities, without mechanical displacements along z-axis. However, the determination of the correct focusing distance is a non-trivial and time consuming issue. A deep learning (DL) solution is proposed to cast the autofocusing as a regression problem and tested over both experimental and simulated holograms. Single wavelength digital holograms were recorded by a Digital Holographic Microscope (DHM) with a 10x microscope objective from a patterned target moving in 3D over an axial range of 92 μm. Tiny DL models are proposed and compared such as a tiny Vision Transformer (TViT), tiny VGG16 (TVGG) and a tiny Swin-Transfomer (TSwinT). The proposed tiny networks are compared with their original versions (ViT/B16, VGG16 and Swin-Transformer Tiny) and the main neural networks used in digital holography such as LeNet and AlexNet. The experiments show that the predicted focusing distance Z_R^Pred is accurately inferred with an accuracy of 1.2 μm in average in comparison with the DHM depth of field of 15 μm. Numerical simulations show that all tiny models give the Z_R^Pred with an error below 0.3 μm. Such a prospect would significantly improve the current capabilities of computer vision position sensing in applications such as 3D microscopy for life sciences or micro-robotics. Moreover, all models reach an inference time on CPU, inferior to 25 ms per inference. In terms of occlusions, TViT based on its Transformer architecture is the most robust.

updated: Fri May 20 2022 10:56:45 GMT+0000 (UTC)

published: Tue Mar 15 2022 10:52:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト