Vision-Language-Navigation Agent Research
This repository generates semantically-paired instructions for Room-to-Room. Generating semantically-paired instructions is treated as a paraphrasing task. Paraphrased sequences are generated using a finetuned Pegasus model. Results were tested and compared on the Discrete-Continuous-VLN model.
It's recommended to use Conda to manage packages. Create separate conda environments, one to manage the data generation, and another to run the agent.
- Create a conda environment
conda create -n pegasus-paraphrase python=3.9
- Use your preferred method and CUDA version (if applicable) to install PyTorch version 1.10.1, following the instructions from here.
- After cloning this repo, run the following:
cd VLN-CZ-KG/data_generation pip install -r requirements.txt
- After cloning this repo, initialize the submodules with the following:
cd VLN-CZ-KG
git submodule update --init --recursive
git submodule foreach git pull origin main
cd data_generation
pip install -r requirements.txt
- Run the following:
cd Discrete-Continuous-VLN
and follow the instructions in their README.md
to install dependencies, including their instructions to install habitat-sim and habitat-lab. It is not necessary to download the connectivity graphs, but follow their instructions for downloading the Matterport3d scene data.
- Make sure you're in the correct Conda environment:
conda activate pegasus-paraphrase
- Check that the constants at the top of
data_generation/synthetic_data_generation_r2r.py
point to the correct file paths. (tamper with the other constants at your own risk.)- The
LOAD_DIR
string should contain the path to datasets folder that will be modified/paraphrased. In other words, theLOAD_DIR
path should contain the original dataset. TheLOAD_DIR
folder should have the following structure:
LOAD_DIR/ |--- test/ | |--- test.json.gz | |--- train/ | |--- train.json.gz | |--- train_gt.json.gz | |--- val_seen/ | |--- val_seen.json.gz | |--- val_seen_gt.json.gz | |--- val_unseen/ | |--- val_unseen.json.gz | |--- val_unseen_gt.json.gz
- The
OUT_DIR
string should contain the path to the output datasets folder. Make sure that the specified folder already has the following structure because the script does not create new directories:
OUT_DIR/ |--- test/ | |--- train/ | |--- val_seen/ | |--- val_unseen/
- The
- Run the following:
python synthetic_data_generation_r2r.py
- The script took approximately 2-3 hours to run on a GTX3080 GPU.
- Navigate to your
OUT_DIR
path.- For any newly generated .json files, run
gzip [generated_file_name].json
- For any other files that existed in the original
LOAD_DIR
, copy them over to the corresponding place inOUT_DIR
.
- For any newly generated .json files, run
After all that work, you can run the agent :)
- Ensure that you're in the correct Conda environment:
conda activate dcvln
- Navigate to the Discrete-Continuous-VLN directory
cd Discrete-Continuous-VLN
- Edit
run_CMA.bash
to uncomment the training task. Edit theexp_name
flag to edit what folder checkpoints are saved to. - Edit
habitat_extensions/config/vlnce_task.yaml
. Modify anydata/datasets...
paths to point to theOUT_DIR
specified in the previous step. These paths should be relative to the Discrete-Continuous-VLN directory.- For example, if
OUT_DIR = "Discrete-Continuous-VLN/data/datasets/paraphrased"
, then the lineGT_PATH: data/datasets/R2R_VLNCE_v1-2_preprocessed/{split}/{split}_gt.json.gz
would be modified to be:GT_PATH: data/datasets/paraphrased/{split}/{split}_gt.json.gz
- For example, if
- Refer to instructions here to train and run.
- After training, refer to Discrete-Continuous-VLN instructions to evaluate. Make sure to modify the eval flag in
run_CMA.bash
to point to the correct checkpoint from the latest training.