Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian et al.
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Neverthele...
Zhaofan Qiu, Ting Yao, Chong‐Wah Ngo, Xinmei Tian et al.
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for visual recognition problems. Nevertheless, the convolutional filters in these networks are local operations while ignoring the large-range dependency. Such drawback becomes even worse particularly for video reco...
Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian et al.
Temporally localizing actions in a video is a fundamental challenge in video understanding. Most existing approaches have often drawn inspiration from image object detection and extended the advances, e.g., SSD and Faster R-CNN, to produce temporal locations of an action in a 1D sequence. Neverthele...