You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Factory
- vqa models, convnets and vqa datasets can be created via factories
VQA 2.0
- VQA2(AbstractVQA) added
VisualGenome
- VisualGenome(AbstractVQADataset) added for merging with VQA datasets
- VisualGenomeImages(AbstractImagesDataset) added to extract features
- `extract.py` now allows to extract VisualGenome features
Variable features size
- `extract.py` now allows to extract from images of size != 448 via cli arg `--size`
- FeaturesDataset now have an optional `opt['size']` parameter
FBResNet152
- `convnets.py` provides support for external pretrained-models as well as ResNets from torchvision
- especially FBResNet152 is the porting of fbresnet152torch from torch7 used until now
Copy file name to clipboardexpand all lines: README.md
+55-20
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,20 @@
1
1
# Visual Question Answering in pytorch
2
2
3
-
This repo was made by [Remi Cadene](http://remicadene.com) (LIP6) and [Hedi Ben-Younes](https://twitter.com/labegne) (LIP6-Heuritech), two PhD Students working on VQA at [UPMC-LIP6](http://lip6.fr) and their professors [Matthieu Cord](http://webia.lip6.fr/~cord) (LIP6) and [Nicolas Thome](http://webia.lip6.fr/~thomen) (LIP6-CNAM). We developped this code in the frame of a research paper called [MUTAN: Multimodal Tucker Fusion for VQA](https://arxiv.org/abs/1705.06676) which is (as far as we know) the current state-of-the-art on the [VQA-1 dataset](http://visualqa.org).
3
+
This repo was made by [Remi Cadene](http://remicadene.com) (LIP6) and [Hedi Ben-Younes](https://twitter.com/labegne) (LIP6-Heuritech), two PhD Students working on VQA at [UPMC-LIP6](http://lip6.fr) and their professors [Matthieu Cord](http://webia.lip6.fr/~cord) (LIP6) and [Nicolas Thome](http://webia.lip6.fr/~thomen) (LIP6-CNAM). We developped this code in the frame of a research paper called [MUTAN: Multimodal Tucker Fusion for VQA](https://arxiv.org/abs/1705.06676) which is (as far as we know) the current state-of-the-art on the [VQA 1.0 dataset](http://visualqa.org).
4
4
5
5
The goal of this repo is two folds:
6
6
- to make it easier to reproduce our results,
7
7
- to provide an efficient and modular code base to the community for further research on other VQA datasets.
8
8
9
9
If you have any questions about our code or model, don't hesitate to contact us or to submit any issues. Pull request are welcome!
10
10
11
+
#### News:
12
+
13
+
- coming soon: pretrained models on VQA2, features of FBResnet152, web app demo
14
+
- 18th july 2017: VQA2, VisualGenome, FBResnet152 (for pytorch) added
15
+
- 16th july 2017: paper accepted at ICCV2017
16
+
- 30th may 2017: poster accepted at CVPR2017 (VQA Workshop)
17
+
11
18
#### Summary:
12
19
13
20
*[Introduction](#introduction)
@@ -27,7 +34,10 @@ If you have any questions about our code or model, don't hesitate to contact us
27
34
*[Models](#models)
28
35
*[Quick examples](#quick-examples)
29
36
*[Extract features from COCO](#extract-features-from-coco)
30
-
*[Train models on VQA](#train-models-on-vqa)
37
+
*[Extract features from VisualGenome](#extract-features-from-visualgenome)
38
+
*[Train models on VQA 1.0](#train-models-on-vqa-1-0)
39
+
*[Train models on VQA 2.0](#train-models-on-vqa-2-0)
40
+
*[Train models on VQA + VisualGenome](#train-models-on-vqa-2-0)
31
41
*[Monitor training](#monitor-training)
32
42
*[Restart training](#restart-training)
33
43
*[Evaluate models on VQA](#evaluate-models-on-vqa)
@@ -108,7 +118,7 @@ Our code has two external dependencies:
108
118
Data will be automaticaly downloaded and preprocessed when needed. Links to data are stored in `vqa/datasets/vqa.py` and `vqa/datasets/coco.py`.
109
119
110
120
111
-
## Reproducing results
121
+
## Reproducing results on VQA 1.0
112
122
113
123
### Features
114
124
@@ -173,7 +183,7 @@ To obtain test and testdev results, you will need to zip your result json file (
173
183
|
174
184
├── train.py # train & eval models
175
185
├── eval_res.py # eval results files with OpenEnded metric
176
-
├── extract.pt # extract features from coco with CNNs
186
+
├── extract.py # extract features from coco with CNNs
177
187
└── visu.py # visualize logs and monitor training
178
188
```
179
189
@@ -189,16 +199,15 @@ You can easly add new options in your custom yaml file if needed. Also, if you w
189
199
190
200
### Datasets
191
201
192
-
We currently provide three datasets:
202
+
We currently provide four datasets:
193
203
194
204
-[COCOImages](http://mscoco.org/) currently used to extract features, it comes with three datasets: trainset, valset and testset
195
-
- COCOFeatures used by any VQA datasets
196
-
-[VQA](http://www.visualqa.org/vqa_v1_download.html) comes with four datasets: trainset, valset, testset (including test-std and test-dev) and "trainvalset" (concatenation of trainset and valset)
205
+
-[VisualGenomeImages]() currently used to extract features, it comes with one split: trainset
206
+
-[VQA 1.0](http://www.visualqa.org/vqa_v1_download.html) comes with four datasets: trainset, valset, testset (including test-std and test-dev) and "trainvalset" (concatenation of trainset and valset)
207
+
-[VQA 2.0](http://www.visualqa.org) same but twice bigger (however same images than VQA 1.0)
Display help message, selected options and run default. The needed data will be automaticaly downloaded and processed using the options in `options/default.yaml`.
Run a MutanAtt model on the trainset and valset (by default) and run throw the testset after each epoch (produce a results file that you can submit to the evaluation server).
0 commit comments