DanHAR: Dual Attention Network For Multimodal Human Activity Recognition Using Wearable Sensors

Wenbin Gao; Lei Zhang; Qi Teng; Hao Wu; Jun He

DanHAR：ウェアラブルセンサーを使用したマルチモーダル人間活動認識のためのデュアルアテンションネットワーク

ユビキタスコンピューティングにおける人間活動認識（HAR）は、加速度計やジャイロスコープなどのマルチモーダルセンサーからの豊富なセンシングデータを使用して人間の活動を推論するディープニューラルネットワーク（DNN）のコンテキストに注意を組み込み始めています。最近、ゲートリカレントユニット（GRU）と長期短期記憶（LSTM）ネットワークを組み合わせることで、2つの注意方法が提案されています。これらは、空間領域と時間領域の両方で同時に信号の依存関係をキャプチャできます。ただし、リカレントネットワークは、畳み込みニューラルネットワーク（CNN）と比較して、パワーを表す弱い機能を備えていることがよくあります。一方、時間的領域では、CNNとの組み合わせにより2つの注意、つまりハードな注意とソフトな注意が適用され、長いシーケンスからのターゲットアクティビティにより注意を向けます。ただし、どこに焦点を当てるか、チャンネル情報を見逃すことしかできないため、何に焦点を当てるかを決定する上で重要な役割を果たします。結果として、注意力に基づくGRUまたはLSTMと比較して、マルチモーダルセンシング信号の時空間依存性に対処できません。本論文では、DNNHARと呼ばれる新しいデュアルアテンション方式を提案します。これは、CNNでチャネルのアテンションと時間的アテンションをブレンドするフレームワークを導入し、マルチモーダルHARの包括性を改善する際の優位性を示します。 4つの公開HARデータセットと弱くラベル付けされたデータセットでの広範な実験は、DanHARがパラメーターのオーバーヘッドを無視して、最先端のパフォーマンスを達成することを示しています。さらに、分析を視覚化することで、分類中に重要なセンサーのモダリティとタイムステップを増幅できることを示し、人間の一般的な直感とよく一致しています。

Human activity recognition (HAR) in ubiquitous computing has been beginning to incorporate attention into the context of deep neural networks (DNNs), in which the rich sensing data from multimodal sensors such as accelerometer and gyroscope is used to infer human activities. Recently, two attention methods are proposed via combining with Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) network, which can capture the dependencies of sensing signals in both spatial and temporal domains simultaneously. However, recurrent networks often have a weak feature representing power compared with convolutional neural networks (CNNs). On the other hand, two attention, i.e., hard attention and soft attention, are applied in temporal domains via combining with CNN, which pay more attention to the target activity from a long sequence. However, they can only tell where to focus and miss channel information, which plays an important role in deciding what to focus. As a result, they fail to address the spatial-temporal dependencies of multimodal sensing signals, compared with attention-based GRU or LSTM. In the paper, we propose a novel dual attention method called DanHAR, which introduces the framework of blending channel attention and temporal attention on a CNN, demonstrating superiority in improving the comprehensibility for multimodal HAR. Extensive experiments on four public HAR datasets and weakly labeled dataset show that DanHAR achieves state-of-the-art performance with negligible overhead of parameters. Furthermore, visualizing analysis is provided to show that our attention can amplifies more important sensor modalities and timesteps during classification, which agrees well with human common intuition.

updated: Tue Jul 20 2021 01:33:26 GMT+0000 (UTC)

published: Thu Jun 25 2020 14:17:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト