- Udacity Computer Vision Nanodegree Image Captioning Project
- The repository contains a neural network, which can automatically generate captions from images.
- Instructions how to setup
First we need pycocotools to be setuped to load coco dataset images and corresponding annotations
- Create a new environment
conda create -n <envName><br>
- Activate the environment
conda activate <envName><br>
- Install cython
pip install cython<br>
- Install git
conda install -c anaconda git<br>
- Install pycocotools from this GitHub rep
pip install git+https://github.com/philferriere/cocoapi.git#egg=pycocotools^&subdirectory=PythonAPI
- Download some specific data from here: http://cocodataset.org/#download (described below)
-
Under Annotations, download: 2014 Train/Val annotations [241MB] (extract captions_train2014.json and captions_val2014.json, and place at locations cocoapi/annotations/captions_train2014.json and cocoapi/annotations/captions_val2014.json, respectively)
2014 Testing Image info [1MB] (extract image_info_test2014.json and place at location cocoapi/annotations/image_info_test2014.json) -
Under Images, download: 2014 Train images [83K/13GB] (extract the train2014 folder and place at location cocoapi/images/train2014/) 2014 Val images [41K/6GB] (extract the val2014 folder and place at location cocoapi/images/val2014/) 2014 Test images [41K/6GB] (extract the test2014 folder and place at location cocoapi/images/test2014/)
- Place cocoapi into opt folder created in project parent path. For more detailed steps follow the instruction described here here just don't remember to place cocoapi under opt!
The solution architecture consists of:
- CNN encoder, which encodes the images into the embedded feature vectors:
- Decoder, which is a sequential neural network consisting of LSTM units, which translates the feature vector into a sequence of tokens:
These are some of the outputs give by the network using the COCO dataset:
- Validation dataset prediction with calculated BLEU calculated averaging over 1000 validation samples
- Test dataset prediction
- Snipped of training log