3.5 KiB
Recurrent-VLN-BERT
Code of the Recurrent-VLN-BERT paper:
A Recurrent Vision-and-Language BERT for Navigation
Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould
Prerequisites
Installation
Install the Matterport3D Simulator. Please find the versions of packages in our environment here.
Install the Pytorch-Transformers. In particular, we use this version (same as OSCAR) in our experiments.
Data Preparation
Please follow the instructions below to prepare the data in directories:
- MP3D navigability graphs:
connectivity- Download the connectivity maps [23.8MB].
- R2R data:
data- Download the R2R data [5.8MB].
- Augmented data:
data/prevalent- Download the collected triplets in PREVALENT [1.5GB] (pre-processed for easy use).
- MP3D image features:
img_features- Download the Scene features [4.2GB] (ResNet-152-Places365).
Trained Network Weights
- Recurrent-VLN-BERT:
snap- Download the trained network weights [2.5GB] for our OSCAR-based and PREVALENT-based models.
R2R Navigation
Please read Peter Anderson's VLN paper for the R2R Navigation task.
Our code is based on the code structure of the EnvDrop.
Reproduce Testing Results
To replicate the performance reported in our paper, load the trained network weights and run validation:
bash run/agent.bash
Training
Navigator
To train the network from scratch, first train a Navigator on the R2R training split:
Modify run/agent.bash, remove the argument for --load and set --train listener. Then,
bash run/agent.bash
The trained Navigator will be saved under snap/.
Speaker
You also need to train a Speaker for augmented training:
bash run/speak.bash
The trained Speaker will be saved under snap/.
Augmented Navigator
Finally, keep training the Navigator with the mixture of original data and augmented data:
bash run/bt_envdrop.bash
We apply a one-step learning rate decay to 1e-5 when training saturates.
Citation
If you use or discuss our Entity Relationship Graph, please cite our paper:
@article{hong2020language,
title={Language and Visual Entity Relationship Graph for Agent Navigation},
author={Hong, Yicong and Rodriguez, Cristian and Qi, Yuankai and Wu, Qi and Gould, Stephen},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}