No CrossRef data available.
Article contents
From image to language and back again
Published online by Cambridge University Press: 23 April 2018
Extract
Work in computer vision and natural language processing involving images and text has been experiencing explosive growth over the past decade, with a particular boost coming from the neural network revolution. The present volume brings together five research articles from several different corners of the area: multilingual multimodal image description (Frank et al.), multimodal machine translation (Madhyastha et al., Frank et al.), image caption generation (Madhyastha et al., Tanti et al.), visual scene understanding (Silberer et al.), and multimodal learning of high-level attributes (Sorodoc et al.). In this article, we touch upon all of these topics as we review work involving images and text under the three main headings of image description (Section 2), visually grounded referring expression generation (REG) and comprehension (Section 3), and visual question answering (VQA) (Section 4).
- Type
- Articles
- Information
- Natural Language Engineering , Volume 24 , Special Issue 3: Language for Images , May 2018 , pp. 325 - 362
- Copyright
- Copyright © Cambridge University Press 2018