OtherOpen Access

Deconfounded Image Captioning: A Causal Retrospect

Authors

Xu Yang, Hanwang Zhang, Jianfei Cai

Author Affiliations

Nanyang Technological University, Southeast University, Monash University

Published InarXiv (Cornell University)

Year2020

Citations38

DOI10.48550/arxiv.2003.03923

Abstract

Dataset bias in vision-language tasks is becoming one of the main problems which hinders the progress of our community. Existing solutions lack a principled analysis about why modern image captioners easily collapse into dataset bias. In this paper, we present a novel perspective: Deconfounded Image Captioning (DIC), to find out the answer of this question, then retrospect modern neural image captioners, and finally propose a DIC framework: DICv1.0 to alleviate the negative effects brought by dataset bias. DIC is based on causal inference, whose two principles: the backdoor and front-door adjustments, help us review previous studies and design new effective models. In particular, we showcase that DICv1.0 can strengthen two prevailing captioning models and can achieve a single-model 131.1 CIDEr-D…

View at Publisher

BORR does not host full-text PDFs. The button above takes you to the original publisher.

Fields & Keywords

Physical Sciences Computer Science Computer Vision and Pattern Recognition Multimodal Machine Learning Applications Domain Adaptation and Few-Shot Learning Human Pose and Action Recognition Artificial intelligence Natural language processing Machine learning Data science Econometrics Computer security