To interact with humans in collaborative environments, machines need to be able to predict (i.e., anticipate) future events, and execute actions in a timely manner. However, the observation of the human limb movements may not be sufficient to anticipate their actions unambiguously. In this work, we consider two additional sources of information (i.e., context) over time, gaze, movement and object information, and study how these additional contextual cues improve the action anticipation performance. We address action anticipation as a classification task, where the model takes the available information as the input and predicts the most likely action. We propose to use the uncertainty about each prediction as an online decision-making criterion for action anticipation. Uncertainty is modeled as a stochastic process applied to a time-based neural network architecture, which improves the conventional class-likelihood (i.e., deterministic) criterion. The main contributions of this paper are four-fold: (i) We propose a novel and effective decision-making criterion that can be used to anticipate actions even in situations of high ambiguity; (ii) we propose a deep architecture that outperforms previous results in the action anticipation task when using the Acticipate collaborative dataset; (iii) we show that contextual information is important to disambiguate the interpretation of similar actions; and (iv) we also provide a formal description of three existing performance metrics that can be easily used to evaluate action anticipation models.Our results on the Acticipate dataset showed the importance of contextual information and the uncertainty criterion for action anticipation. We achieve an average accuracy of 98.75% in the anticipation task using only an average of 25% of observations.
updated: Thu Jun 18 2020 06:17:03 GMT+0000 (UTC)
published: Tue Oct 01 2019 23:30:08 GMT+0000 (UTC)