Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Jianing Qiu; Frank P. -W. Lo; Xiao Gu; Modou L. Jobarteh; Wenyan Jia; Tom Baranowski; Matilda Steiner-Asiedu; Alex K. Anderson; Megan A McCrory; Edward Sazonov; Mingui Sun; Gary Frost; Benny Lo

プライバシーを保護した受動的な食事摂取モニタリングのための自己中心的な画像キャプション

カメラベースの受動的な食事摂取モニタリングは、被験者の食事エピソードを継続的にキャプチャし、消費された食物の種類と量、被験者の食事行動などの豊富な視覚情報を記録することができます。しかし、現在のところ、これらの視覚的な手がかりを組み込み、受動的な記録から食事摂取量の包括的なコンテキストを提供できる方法はありません (たとえば、被験者が他の人と食べ物を共有しているか、被験者が食べている食べ物はどれくらいか、どれくらいの食べ物が残っているかなど)。ボウルに）。一方、自己中心的なウェアラブルカメラがキャプチャに使用されている間、プライバシーは大きな懸念事項です.この論文では、食品の認識、量の推定、およびシーンの理解を統合する、パッシブモニタリングを使用した食事評価のためのプライバシー保護された安全なソリューション (つまり、自己中心的な画像キャプション) を提案します。画像をリッチテキストの説明に変換することで、栄養士は元の画像ではなくキャプションに基づいて個々の食事摂取量を評価できるため、画像からプライバシーが漏洩するリスクが軽減されます。この目的のために、ガーナでのフィールド調査で頭部装着カメラと胸部装着カメラによってキャプチャされた野生の画像で構成される、自己中心的な食事画像キャプションデータセットが構築されました。新しい変圧器ベースのアーキテクチャは、自己中心的な食事の画像にキャプションを付けるように設計されています。効果を評価し、自己中心的な食事の画像キャプションのために提案されたアーキテクチャの設計を正当化するために、包括的な実験が行われました。私たちの知る限り、これは実際の生活環境での食事摂取量評価に画像キャプションを適用した最初の作品です。

Camera-based passive dietary intake monitoring is able to continuously capture the eating episodes of a subject, recording rich visual information, such as the type and volume of food being consumed, as well as the eating behaviours of the subject. However, there currently is no method that is able to incorporate these visual clues and provide a comprehensive context of dietary intake from passive recording (e.g., is the subject sharing food with others, what food the subject is eating, and how much food is left in the bowl). On the other hand, privacy is a major concern while egocentric wearable cameras are used for capturing. In this paper, we propose a privacy-preserved secure solution (i.e., egocentric image captioning) for dietary assessment with passive monitoring, which unifies food recognition, volume estimation, and scene understanding. By converting images into rich text descriptions, nutritionists can assess individual dietary intake based on the captions instead of the original images, reducing the risk of privacy leakage from images. To this end, an egocentric dietary image captioning dataset has been built, which consists of in-the-wild images captured by head-worn and chest-worn cameras in field studies in Ghana. A novel transformer-based architecture is designed to caption egocentric dietary images. Comprehensive experiments have been conducted to evaluate the effectiveness and to justify the design of the proposed architecture for egocentric dietary image captioning. To the best of our knowledge, this is the first work that applies image captioning for dietary intake assessment in real life settings.

updated: Wed Mar 01 2023 08:20:17 GMT+0000 (UTC)

published: Thu Jul 01 2021 11:16:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト