YONA: You Only Need One Adjacent Reference-frame for Accurate and Fast Video Polyp Detection

Yuncheng Jiang; Zixun Zhang; Ruimao Zhang; Guanbin Li; Shuguang Cui; Zhen Li

YONA: 正確かつ高速なビデオポリープ検出には、隣接する参照フレームが 1 つだけ必要です

正確なポリープ検出は、直腸がんの臨床診断を支援するために不可欠です。結腸内視鏡検査のビデオには静止画像よりも豊富な情報が含まれているため、深層学習手法にとって貴重なリソースとなります。マルチフレームの時間的/空間的集約を通じてビデオポリープ検出を実行するために、多大な努力が払われてきました。ただし、一般的な固定カメラビデオとは異なり、結腸内視鏡検査ビデオのカメラが動くシーンでは急速なビデオジッターが発生する可能性があり、既存のビデオ検出モデルのトレーニングが不安定になる可能性があります。さらに、一部のポリープの隠蔽された性質と複雑な背景環境が、既存のビデオ検出器の性能をさらに妨げています。この論文では、ビデオポリープ検出のための効率的なエンドツーエンドのトレーニングフレームワークである YONA (You Only Need one Adjacent Reference-frame) 法を提案します。 YONA は、1 つ前の隣接フレームの情報を最大限に活用し、複数フレームの連携を行わずに現在のフレームでポリープ検出を実行します。具体的には、前景の場合、YONA は、前景の類似性に応じて、現在のフレームのチャネルアクティベーションパターンを隣接する参照フレームと適応的に位置合わせします。背景については、YONA はフレーム間差分に基づいて背景の動的位置合わせを実行し、劇的な空間ジッターによって生成される無効な特徴を排除します。さらに、YONA は、トレーニング中にクロスフレーム対比学習を適用し、グラウンドトゥルースバウンディングボックスを活用して、モデルのポリープと背景の認識を向上させます。 3 つの公開された挑戦的なベンチマークでの定量的および定性的な実験により、私たちが提案する YONA が、精度と速度の両方で以前の最先端の競合製品を大幅に上回ることが実証されました。

Accurate polyp detection is essential for assisting clinical rectal cancer diagnoses. Colonoscopy videos contain richer information than still images, making them a valuable resource for deep learning methods. Great efforts have been made to conduct video polyp detection through multi-frame temporal/spatial aggregation. However, unlike common fixed-camera video, the camera-moving scene in colonoscopy videos can cause rapid video jitters, leading to unstable training for existing video detection models. Additionally, the concealed nature of some polyps and the complex background environment further hinder the performance of existing video detectors. In this paper, we propose the YONA (You Only Need one Adjacent Reference-frame) method, an efficient end-to-end training framework for video polyp detection. YONA fully exploits the information of one previous adjacent frame and conducts polyp detection on the current frame without multi-frame collaborations. Specifically, for the foreground, YONA adaptively aligns the current frame's channel activation patterns with its adjacent reference frames according to their foreground similarity. For the background, YONA conducts background dynamic alignment guided by inter-frame difference to eliminate the invalid features produced by drastic spatial jitters. Moreover, YONA applies cross-frame contrastive learning during training, leveraging the ground truth bounding box to improve the model's perception of polyp and background. Quantitative and qualitative experiments on three public challenging benchmarks demonstrate that our proposed YONA outperforms previous state-of-the-art competitors by a large margin in both accuracy and speed.

updated: Tue Jun 06 2023 13:53:15 GMT+0000 (UTC)

published: Tue Jun 06 2023 13:53:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト