Activity Grammars for Temporal Action Segmentation

1Pohang University of Science and Technology (POSTECH)
*Equal Contribution

Abstract

Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason.

This paper addresses the problem by introducing an effective activity grammar to guide neural predictions for temporal action segmentation. We propose a novel grammar induction algorithm that extracts a powerful context-free grammar from action sequence data. We also develop an efficient generalized parser that transforms frame-level probability distributions into a reliable sequence of actions according to the induced grammar with recursive rules. Our approach can be combined with any neural network for temporal action segmentation to enhance the sequence prediction and discover its compositional structure.

Experimental results demonstrate that our method significantly improves temporal action segmentation in terms of both performance and interpretability on two standard benchmarks, Breakfast and 50 Salads.

Overall Pipeline

Illustration of the overall architecture of the proposed method. (a) KARI induces an activity grammar from action sequences in the training data, (b) BEP parses neural predictions from the off-the-shelf temporal action segmentation model given a video by using the KARI-induced grammar, and (c) the final output of optimal action sequences and lengths is achieved through segmentation optimization. It is best viewed in color.

Video


Experiments

The performance comparison on 50 Salads

The performance comparison on Breakfast

Qualitative Results

Making Salads in 50 Salads

Frying eggs in Breakfast


Related Links

BibTeX

@inproceedings{gong2023activity,
      title={Activity Grammars for Temporal Action Segmentation},
      author={Dayoung Gong and Joonseok Lee and Deunsol Jung and Suha Kwak and Minsu Cho},
      booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
      year={2023},
      url={https://openreview.net/forum?id=oOXZ5JEjPb}
}