Reference Based LSTM for Image Captioning | Cool Papers

#1 Reference Based LSTM for Image Captioning [PDF] [Copy] [Kimi] [REL]

Authors: Minghai Chen, Guiguang Ding, Sicheng Zhao, Hui Chen, Qiang Liu, Jungong Han

Image captioning is an important problem in artificial intelligence, related to both computer vision and natural language processing. There are two main problems in existing methods: in the training phase, it is difficult to find which parts of the captions are more essential to the image; in the caption generation phase, the objects or the scenes are sometimes misrecognized. In this paper, we consider the training images as the references and propose a Reference based Long Short Term Memory (R-LSTM) model, aiming to solve these two problems in one goal. When training the model, we assign different weights to different words, which enables the network to better learn the key information of the captions. When generating a caption, the consensus score is utilized to exploit the reference information of neighbor images, which might fix the misrecognition and make the descriptions more natural-sounding. The proposed R-LSTM model outperforms the state-of-the-art approaches on the benchmark dataset MS COCO and obtains top 2 position on 11 of the 14 metrics on the online test server.

Subject: AAAI.2017 - Vision

11198@AAAI

#1 Reference Based LSTM for Image Captioning [PDF] [Copy] [Kimi] [REL]