Skip to content

Latest commit

 

History

History
18 lines (11 loc) · 1.17 KB

README.md

File metadata and controls

18 lines (11 loc) · 1.17 KB

Generating captions using deep neural networks

In this project, we implemented a caption generator. It was done following the idea described in https://arxiv.org/pdf/1411.4555v1.pdf

The images were first processed through Alexnet (using an implementation found on http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/). We then used the results processed by the 7th layer of this network as input to the caption generator, instead of the raw images. -> the codes for this are in the traitementImgs folder For the captions, we used a vector representation of words, using a word2vec model.

We then used a simple LSTM, that we trained on the Flickr8k dataset.

Some examples of the result can be found in the results.pdf file. Some statistics on them : on 60 images randomly (the first 60 ones...) selected from the tests set, the captions generated :

  • made no sense for 32 of them
  • were somehow related to elements in the picture for 22 of them
  • actually provided a rather good description of the picture for 6 of them

(Looking back on it, a more integrated pipeline would definitely have made the choice and testing of hyperparameters easier, and would probaly have yielded better results)