Back to Search
Journal ArticleOpen Access

Learning Spatio-Temporal Representation With Local and Global Diffusion

Author Affiliations
University of Science and Technology of China, Jingdong (China), JDSU (United States), City University of Hong Kong, ...
Year2019
Citations217

Abstract

Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized…
View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.