Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

Lu Yu; Malvina Nikandrou; Jiali Jin; Verena Reiser

視覚障害のある人を安全に支援するための、品質にとらわれない画像キャプション

自動化された画像キャプションは、視覚障害を持つ人々にとって便利なツールになる可能性があります。このユーザーグループが撮影した画像にはノイズが含まれていることが多く、モデルの予測が正しくなく、安全ではないことさえあります。このホワイトペーパーでは、視覚障害者向けの画像キャプションモデルのパフォーマンスと堅牢性を向上させるために、品質に依存しないフレームワークを提案します。この問題に、データ、モデル、評価の 3 つの角度から取り組みます。まず、合成ノイズを生成するためのデータ拡張技術が、このドメインのデータの希薄性にどのように対処できるかを示します。次に、最先端のモデルをデュアルネットワークアーキテクチャに拡張し、増強されたデータを使用し、さまざまな一貫性の損失を活用することで、モデルの堅牢性を高めます。私たちの結果は、最先端の画像キャプションネットワークと比較して CIDEr で 2.15 の絶対的な改善など、パフォーマンスの向上を示しています。また、よりノイズの多い環境では、CIDEr で最大 3 ポイント改善され、ノイズに対するロバスト性が向上しています。最後に、さまざまな難易度/ノイズレベルの画像で信頼度キャリブレーションを使用して予測の信頼性を評価し、安全性が重要な状況でモデルがより確実に機能することを示します。改良されたモデルは、英国王立盲人協会と提携して開発した支援生活アプリケーションの一部です。

Automated image captioning has the potential to be a useful tool for people with vision impairments. Images taken by this user group are often noisy, which leads to incorrect and even unsafe model predictions. In this paper, we propose a quality-agnostic framework to improve the performance and robustness of image captioning models for visually impaired people. We address this problem from three angles: data, model, and evaluation. First, we show how data augmentation techniques for generating synthetic noise can address data sparsity in this domain. Second, we enhance the robustness of the model by expanding a state-of-the-art model to a dual network architecture, using the augmented data and leveraging different consistency losses. Our results demonstrate increased performance, e.g. an absolute improvement of 2.15 on CIDEr, compared to state-of-the-art image captioning networks, as well as increased robustness to noise with up to 3 points improvement on CIDEr in more noisy settings. Finally, we evaluate the prediction reliability using confidence calibration on images with different difficulty/noise levels, showing that our models perform more reliably in safety-critical situations. The improved model is part of an assisted living application, which we develop in partnership with the Royal National Institute of Blind People.

updated: Fri Apr 28 2023 04:32:28 GMT+0000 (UTC)

published: Fri Apr 28 2023 04:32:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト