Back to Search
OtherOpen Access

Deconfounded Image Captioning: A Causal Retrospect

Author Affiliations
Nanyang Technological University, Southeast University, Monash University
Published InarXiv (Cornell University)
Year2020
Citations38

Abstract

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D…
View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.