Journal ArticleUnknown

Auto-Parsing Network for Image Captioning and Visual Question Answering

Authors

Xu Yang, Chongyang Gao, Hanwang Zhang, Jianfei Cai

Author Affiliations

Southeast University, Dartmouth College, Nanyang Technological University, Monash University

Published In2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Year2021

Citations36

DOI10.1109/iccv48922.2021.00220

Abstract

We propose an Auto-Parsing Network (APN) to discover and exploit the input data’s hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems. Specifically, we impose a Probabilistic Graphical Model (PGM) parameterized by the attention operations on each self-attention layer to incorporate sparse assumption. We use this PGM to softly segment an input sequence into a few clusters where each cluster can be treated as the parent of the inside entities. By stacking these PGM constrained self-attention layers, the clusters in a lower layer compose into a new sequence, and the PGM in a higher layer will further segment this sequence. Iteratively, a sparse tree can be implicitly parsed, and this tree’s hierarchical knowledge is incorporated into the…

View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.

Fields & Keywords

Physical Sciences Computer Science Computer Vision and Pattern Recognition Multimodal Machine Learning Applications Domain Adaptation and Few-Shot Learning Advanced Image and Video Retrieval Techniques Artificial intelligence Natural language processing Programming language Mathematical analysis