Efficient Pipelines for Vision-Based Context Sensing

Xiaochen Liu

ビジョンベースのコンテキストセンシングのための効率的なパイプライン

コンテキストアウェアネスは、モバイルおよびユビキタスコンピューティングの重要な部分です。その目標は、場所や活動などのモバイルユーザーに関する状況情報を明らかにすることです。感知されたコンテキストは、ナビゲーション、AR、スマートショッピングなどの多くのサービスを可能にします。このような状況は、視覚センサーを含むさまざまな方法で感知できます。世界中に展開されているビジョンソースの出現があります。カメラは、路傍、社内、およびモバイルプラットフォームに設置できます。この傾向は、コンテキストセンシングに使用できる膨大な量のビジョンデータを提供します。ただし、ビジョンデータの収集と分析は、今日でも高度に手動で行われています。データ収集のためにカメラを大規模に展開することは困難です。データからのコンテキストの整理とラベル付けも労働集約的です。近年、高度な視覚アルゴリズムとディープニューラルネットワークが視覚データの分析に使用されています。ただし、このアプローチは、データ品質、ラベル付けの労力、およびハードウェアリソースへの依存によって制限されます。要約すると、今日のビジョンベースのコンテキストセンシングシステムには、大規模なデータ収集とラベリング、限られたハードウェアリソースで大量のデータを効率的に処理すること、ビジョンデータから正確なコンテキストを抽出することの3つの主要な課題があります。この論文では、センシングタスク、センサータイプ、タスクの場所の3つの次元で構成されるデザインスペースについて説明します。私たちの以前の仕事は、このデザインスペースのいくつかのポイントを調査しています。私たちは、（1）視覚ベースのセンシングタスクの設計空間のさまざまなポイントに対して効率的でスケーラブルなソリューションを開発することによって貢献します。（2）これらのアプリケーションで最先端の精度を達成する。（3）そのようなセンシングシステムを設計するためのガイドラインを作成する。

Context awareness is an essential part of mobile and ubiquitous computing. Its goal is to unveil situational information about mobile users like locations and activities. The sensed context can enable many services like navigation, AR, and smarting shopping. Such context can be sensed in different ways including visual sensors. There is an emergence of vision sources deployed worldwide. The cameras could be installed on roadside, in-house, and on mobile platforms. This trend provides huge amount of vision data that could be used for context sensing. However, the vision data collection and analytics are still highly manual today. It is hard to deploy cameras at large scale for data collection. Organizing and labeling context from the data are also labor intensive. In recent years, advanced vision algorithms and deep neural networks are used to help analyze vision data. But this approach is limited by data quality, labeling effort, and dependency on hardware resources. In summary, there are three major challenges for today's vision-based context sensing systems: data collection and labeling at large scale, process large data volumes efficiently with limited hardware resources, and extract accurate context out of vision data. The thesis explores the design space that consists of three dimensions: sensing task, sensor types, and task locations. Our prior work explores several points in this design space. We make contributions by (1) developing efficient and scalable solutions for different points in the design space of vision-based sensing tasks; (2) achieving state-of-the-art accuracy in those applications; (3) and developing guidelines for designing such sensing systems.

updated: Sun Nov 01 2020 05:09:13 GMT+0000 (UTC)

published: Sun Nov 01 2020 05:09:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト