Journal ArticleUnknown
Auto-Parsing Network for Image Captioning and Visual Question Answering
Authors
Author Affiliations
Southeast University, Dartmouth College, Nanyang Technological University, Monash University
Published In2021 IEEE/CVF International Conference on Computer Vision (ICCV)
Year2021
Citations36
Abstract
We propose an Auto-Parsing Network (APN) to discover and exploit the input data’s hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems. Specifically, we impose a Probabilistic Graphical Model (PGM) parameterized by the attention operations on each self-attention layer to incorporate sparse assumption. We use this PGM to softly segment an input sequence into a few clusters where each cluster can be treated as the parent of the inside entities. By stacking these PGM constrained self-attention layers, the clusters in a lower layer compose into a new sequence, and the PGM in a higher layer will further segment this sequence. Iteratively, a sparse tree can be implicitly parsed, and this tree’s hierarchical knowledge is incorporated into the…
View at Publisher
BORR does not host full-text PDFs. The button above takes you to the original publisher.