Unifying visual-semantic embeddings with multimodal neural language models (2014)

by R Kiros, R Salakhutdinov, R S Zemel
Venue:In arXiv:1411.2539