Journal ArticleOpen Access

Learning Spatio-Temporal Representation With Local and Global Diffusion

Authors

Zhaofan Qiu, Ting Yao, Chong‐Wah Ngo, Xinmei Tian, …

Author Affiliations

University of Science and Technology of China, Jingdong (China), JDSU (United States), City University of Hong Kong, ...

Year2019

Citations217

DOI10.1109/cvpr.2019.01233

Abstract

Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video recognition, since video is an information-intensive media with complex temporal variations. In this paper, we present a novel framework to boost the spatio-temporal representation learning by Local and Global Diffusion (LGD). Specifically, we construct a novel neural network architecture that learns the local and global representations in parallel. The architecture is composed of LGD blocks, where each block updates local and global features by modeling the diffusions between these two representations. Diffusions effectively interact two aspects of information, i.e., localized…

View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.

Fields & Keywords

Physical Sciences Computer Science Computer Vision and Pattern Recognition Human Pose and Action Recognition Video Surveillance and Tracking Methods Anomaly Detection Techniques and Applications Artificial intelligence Machine learning Geometry Law