Generating captions using deep neural networks

In this project, we implemented a caption generator. It was done following the idea described in https://arxiv.org/pdf/1411.4555v1.pdf

The images were first processed through Alexnet (using an implementation found on http://www.cs.toronto.edu/~guerzhoy/tf_alexnet/). We then used the results processed by the 7th layer of this network as input to the caption generator, instead of the raw images. -> the codes for this are in the traitementImgs folder For the captions, we used a vector representation of words, using a word2vec model.

We then used a simple LSTM, that we trained on the Flickr8k dataset.

Some examples of the result can be found in the results.pdf file. Some statistics on them : on 60 images randomly (the first 60 ones...) selected from the tests set, the captions generated :

made no sense for 32 of them
were somehow related to elements in the picture for 22 of them
actually provided a rather good description of the picture for 6 of them

(Looking back on it, a more integrated pipeline would definitely have made the choice and testing of hyperparameters easier, and would probaly have yielded better results)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
rnn		rnn
traitementImgs		traitementImgs
README.md		README.md
predicting.py		predicting.py
results.pdf		results.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generating captions using deep neural networks

About

Releases

Packages

Languages

ankprd/captionsGen

Folders and files

Latest commit

History

Repository files navigation

Generating captions using deep neural networks

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages