Adapt to higher Langchain version
This commit is contained in:
commit
62cc22fd38
13
.gitignore
vendored
Normal file
13
.gitignore
vendored
Normal file
@ -0,0 +1,13 @@
|
||||
.ftpignore
|
||||
.ftpconfig
|
||||
.vscode
|
||||
|
||||
# Byte-compiled / optimized / DLL files
|
||||
.ipynb_checkpoints/
|
||||
__pycache__/
|
||||
*.py[cod]
|
||||
*$py.class
|
||||
*.swp
|
||||
|
||||
datasets/*
|
||||
!datasets/.gitkeep
|
||||
3
.gitmodules
vendored
Normal file
3
.gitmodules
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
[submodule "nav_src/LLMs/llama"]
|
||||
path = nav_src/LLMs/llama
|
||||
url = https://github.com/facebookresearch/llama.git
|
||||
21
LICENSE
Normal file
21
LICENSE
Normal file
@ -0,0 +1,21 @@
|
||||
The MIT License (MIT)
|
||||
|
||||
Copyright (c) 2023 Gengze Zhou, Yicong Hong, Qi Wu
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
of this software and associated documentation files (the "Software"), to deal
|
||||
in the Software without restriction, including without limitation the rights
|
||||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
copies of the Software, and to permit persons to whom the Software is
|
||||
furnished to do so, subject to the following conditions:
|
||||
|
||||
The above copyright notice and this permission notice shall be included in all
|
||||
copies or substantial portions of the Software.
|
||||
|
||||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
SOFTWARE.
|
||||
127
README.md
Normal file
127
README.md
Normal file
@ -0,0 +1,127 @@
|
||||
<div align="center">
|
||||
|
||||
<h1>🎇NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models</h1>
|
||||
|
||||
<div>
|
||||
<a href='https://github.com/GengzeZhou' target='_blank'>Gengze Zhou<sup>🍕</sup><sup>🍔</sup></a>;
|
||||
<a href='http://www.yiconghong.me' target='_blank'>Yicong Hong<sup>🌭</sup></a>;
|
||||
<a href='http://www.qi-wu.me' target='_blank'>Qi Wu<sup>🍕</sup><sup>🍔</sup></a>
|
||||
</div>
|
||||
<sup>🍕</sup>The University of Adelaide <sup>🍔</sup>Australian Institude for Machine Learning <sup>🌭</sup>The Australian National University
|
||||
|
||||
<br>
|
||||
|
||||
<div>
|
||||
<a href='https://github.com/GengzeZhou/NavGPT' target='_blank'><img alt="Static Badge" src="https://img.shields.io/badge/NavGPT-v0.1-blue"></a>
|
||||
<a href='https://arxiv.org/abs/2305.16986' target='_blank'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
|
||||
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
|
||||
<a href="https://github.com/langchain-ai/langchain"><img alt="Static Badge" src="https://img.shields.io/badge/🦜️🔗-Langchain-green"></a>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
|
||||
## 🍹 Abstract
|
||||
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such a trend underscored the potential of training LLMs with unlimited language data, advancing the development of a universal embodied agent.
|
||||
In this work, we introduce the NavGPT, a purely LLM-based instruction-following navigation agent, to reveal the reasoning capability of GPT models in complex embodied scenes by performing zero-shot sequential action prediction for vision-and-language navigation (VLN).
|
||||
At each step, NavGPT takes the textual descriptions of visual observations, navigation history, and future explorable directions as inputs to reason the agent's current status, and makes the decision to approach the target.
|
||||
Through comprehensive experiments, we demonstrate NavGPT can explicitly perform high-level planning for navigation, including decomposing instruction into sub-goal, integrating commonsense knowledge relevant to navigation task resolution, identifying landmarks from observed scenes, tracking navigation progress, and adapting to exceptions with plan adjustment.
|
||||
Furthermore, we show that LLMs is capable of generating high-quality navigational instructions from observations and actions along a path, as well as drawing accurate top-down metric trajectory given the agent's navigation history. Despite the performance of using NavGPT to zero-shot R2R tasks still falling short of trained models, we suggest adapting multi-modality inputs for LLMs to use as visual navigation agents and applying the explicit reasoning of LLMs to benefit learning-based models.
|
||||
|
||||
## 🍸 Method
|
||||

|
||||
|
||||
## 🍻 TODOs
|
||||
|
||||
- [x] Release 🎇NavGPT code.
|
||||
- [x] Data preprocessing code.
|
||||
- [x] Custuomized LLM inference guidance.
|
||||
|
||||
## 🧋 Prerequisites
|
||||
|
||||
### 🍭 Installation
|
||||
|
||||
Create a conda environment and install all dependencies:
|
||||
|
||||
```bash
|
||||
conda create --name NavGPT python=3.9
|
||||
conda activate NavGPT
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
### 🍬 Data Preparation
|
||||
|
||||
Download R2R data from [Dropbox](https://www.dropbox.com/sh/i8ng3iq5kpa68nu/AAB53bvCFY_ihYx1mkLlOB-ea?dl=1). Put the data in `datasets` directory.
|
||||
|
||||
Related data preprocessing code can be found in `nav_src/scripts`.
|
||||
|
||||
### 🍫 OpenAi API
|
||||
|
||||
Get an [OpenAI API Key](https://platform.openai.com/account/api-keys) and add to your environment variables:
|
||||
|
||||
```bash
|
||||
# prepare your private OpenAI key (for Linux)
|
||||
export OPENAI_API_KEY={Your_Private_Openai_Key}
|
||||
|
||||
# prepare your private OpenAI key (for Windows)
|
||||
set OPENAI_API_KEY={Your_Private_Openai_Key}
|
||||
```
|
||||
|
||||
Alternatively, you can set the key in your code:
|
||||
```python
|
||||
import os
|
||||
os.environ["OPENAI_API_KEY"] = {Your_Private_Openai_Key}
|
||||
```
|
||||
|
||||
## 🍷 R2R Navigation
|
||||
|
||||
### 🍴 Reproduce Validation Results
|
||||
|
||||
To replicate the performance reported in our paper, use GPT-4 and run validation with following configuration:
|
||||
```bash
|
||||
cd nav_src
|
||||
python NavGPT.py --llm_model_name gpt-4 \
|
||||
--output_dir ../datasets/R2R/exprs/gpt-4-val-unseen \
|
||||
--val_env_name R2R_val_unseen_instr
|
||||
```
|
||||
|
||||
Results will be saved in `datasets/R2R/exprs/gpt-4-val-unseen` directory.
|
||||
|
||||
The defualt `--llm_model_name` is set as `gpt-3.5-turbo`.
|
||||
|
||||
An economic way to try 🎇NavGPT is by using GPT-3.5 and run validation on the first 10 samples with following configuration:
|
||||
```bash
|
||||
cd nav_src
|
||||
python NavGPT.py --llm_model_name gpt-3.5-turbo \
|
||||
--output_dir ../datasets/R2R/exprs/gpt-3.5-turbo-test \
|
||||
--val_env_name R2R_val_unseen_instr \
|
||||
--iters 10
|
||||
```
|
||||
|
||||
### 🥢 Set up Custom LLMs for 🎇NavGPT
|
||||
Add your own model repo as a submodule under `nav_src/LLMs/`:
|
||||
```bash
|
||||
cd nav_src/LLMs
|
||||
git submodule add {Your_Model_Repo}
|
||||
```
|
||||
or just copy your local inference code under `nav_src/LLMs/`.
|
||||
|
||||
Follow the [instructions](nav_src/LLMs/Add_Custom_Models.md) to set up your own LLMs for 🎇NavGPT.
|
||||
|
||||
Run 🎇NavGPT with your custom LLM:
|
||||
```bash
|
||||
cd nav_src
|
||||
python NavGPT.py --llm_model_name your_custom_llm \
|
||||
--output_dir ../datasets/R2R/exprs/your_custom_llm-test
|
||||
```
|
||||
|
||||
## 🧃 Citation
|
||||
If 🎇`NavGPT` has been beneficial to your research and work, please cite our work using the following format:
|
||||
```
|
||||
@article{zhou2023navgpt,
|
||||
title={NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models},
|
||||
author={Zhou, Gengze and Hong, Yicong and Wu, Qi},
|
||||
journal={arXiv preprint arXiv:2305.16986},
|
||||
year={2023}
|
||||
}
|
||||
```
|
||||
BIN
assets/NavGPT.png
Normal file
BIN
assets/NavGPT.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.0 MiB |
BIN
assets/obs.png
Normal file
BIN
assets/obs.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 9.3 MiB |
0
datasets/.gitkeep
Normal file
0
datasets/.gitkeep
Normal file
148
nav_src/LLMs/Add_Custom_Models.md
Normal file
148
nav_src/LLMs/Add_Custom_Models.md
Normal file
@ -0,0 +1,148 @@
|
||||
## Add Custom LLMs for NavGPT
|
||||
|
||||
## Contents
|
||||
|
||||
- [Set up built-in integrations with LLM providers](#set-up-built-in-integrations-with-llm-providers)
|
||||
- [Set up local model inference](#set-up-local-model-inference)
|
||||
- [Step 1: Set up the model environment](#step-1-set-up-the-model-environment)
|
||||
- [Step 2: Set up the inference pipeline](#step-2-set-up-the-inference-pipeline)
|
||||
- [Step 3: Register the custom LLM](#step-3-register-the-custom-llm)
|
||||
- [Step 4: Run NavGPT with the custom LLM](#step-4-run-navgpt-with-the-custom-llm)
|
||||
|
||||
## Set up built-in integrations with LLM providers
|
||||
|
||||
The `Langchain` package has integrated various cloud services which provide LLMs inference APIs ([OpenAI](https://openai.com/), [Cohere](https://cohere.ai/), [Hugging Face](https://huggingface.co/), etc). You can use these services directly by setting up the API keys.
|
||||
|
||||
You can also check out the [Langchain Integrations Documentations](https://python.langchain.com/docs/integrations/llms/) for more information.
|
||||
|
||||
## Set up local model inference
|
||||
|
||||
One possible way to set up local inference is through [Hugging Face Loacal Pipelines](https://python.langchain.com/docs/integrations/llms/huggingface_pipelines) in Langchain.
|
||||
|
||||
However, to maximize the degree of freedom of running local inference or setting up your custum LLMs, we recommend you to set up your own inference pipeline. We provide an example of `nav_src/LLMs/Langchain_llama.py` to show how to set up a local inference pipeline.
|
||||
|
||||
You can check out the [Langchain Custom LLM](https://python.langchain.com/docs/modules/model_io/models/llms/custom_llm) for more information.
|
||||
|
||||
We will use Llama-2 as an example to show how to set up a local inference pipeline.
|
||||
|
||||
### Step 1: Set up the model environment
|
||||
Add the Llama-2 repo as a submodule under `nav_src/LLMs/`:
|
||||
```bash
|
||||
cd nav_src/LLMs
|
||||
git submodule add https://github.com/facebookresearch/llama.git
|
||||
```
|
||||
Because we have already set up the `nav_src/LLMs/llama` as a submodule, you can skip the previous step, initialize and clone the submodule by:
|
||||
```bash
|
||||
git submodule update --init --recursive
|
||||
```
|
||||
|
||||
Download the [Llama-2 weights](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) accroding to the [instructions](https://github.com/facebookresearch/llama) and set up the Llama-2 environment:
|
||||
```bash
|
||||
cd llama
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
### Step 2: Set up the inference pipeline
|
||||
Create your own LLM class `Custom_model` under `nav_src/LLMs/Langchain_model.py`:
|
||||
|
||||
There is only one required `_call` function that a custom LLM needs to implement, for example:
|
||||
```python
|
||||
def _call(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
**kwargs: Any,
|
||||
) -> str:
|
||||
|
||||
if stop is not None:
|
||||
raise ValueError("stop kwargs are not permitted.")
|
||||
|
||||
result = self.model.generate(
|
||||
prompt,
|
||||
max_length=self.max_length,
|
||||
num_beams=self.num_beams,
|
||||
temperature=self.temperature,
|
||||
top_k=self.top_k,
|
||||
top_p=self.top_p,
|
||||
repetition_penalty=self.repetition_penalty,
|
||||
do_sample=self.do_sample,
|
||||
num_return_sequences=self.num_return_sequences,
|
||||
**kwargs,
|
||||
)
|
||||
return result
|
||||
```
|
||||
|
||||
An optional `_identifying_params` property can be rewrited to help with printing of this class. Should return a dictionary.
|
||||
```python
|
||||
@property
|
||||
def _identifying_params(self) -> Mapping[str, Any]:
|
||||
"""Get the identifying parameters."""
|
||||
return {
|
||||
"model_name": self.model_name,
|
||||
"max_length": self.max_length,
|
||||
"num_beams": self.num_beams,
|
||||
"temperature": self.temperature,
|
||||
"top_k": self.top_k,
|
||||
"top_p": self.top_p,
|
||||
"repetition_penalty": self.repetition_penalty,
|
||||
"do_sample": self.do_sample,
|
||||
"num_return_sequences": self.num_return_sequences,
|
||||
}
|
||||
```
|
||||
|
||||
If your custom LLM needs to be initialized with some parameters, you can write your own `from_config` or `from_model_id` classmethod. Check out the example in `nav_src/LLMs/Langchain_llama.py` for more information.
|
||||
|
||||
Here is an example of running our custom Llama-2 locally as a LLMChain in Langchain:
|
||||
```python
|
||||
>>> from langchain import PromptTemplate, LLMChain
|
||||
>>> from nav_src.LLMs.Langchain_llama import Custom_Llama
|
||||
|
||||
>>> ckpt_dir = "LLMs/llama/llama-2-13b"
|
||||
>>> tokenizer_path = "LLMs/llama/tokenizer.model"
|
||||
|
||||
>>> llm = Custom_Llama.from_model_id(
|
||||
temperature=0.75,
|
||||
ckpt_dir = ckpt_dir,
|
||||
tokenizer_path = tokenizer_path,
|
||||
max_seq_len = 4000,
|
||||
max_gen_len = 800,
|
||||
max_batch_size = 4,
|
||||
)
|
||||
|
||||
>>> template = """Question: {question}\nAnswer: Let's think step by step."""
|
||||
>>> prompt = PromptTemplate(template=template, input_variables=["question"])
|
||||
|
||||
>>> llm_chain = LLMChain(prompt=prompt, llm=llm)
|
||||
|
||||
>>> question = "What is electroencephalography?"
|
||||
>>> print(llm_chain.run(question))
|
||||
|
||||
"Sure, I'd be happy to help! Here's a step-by-step explanation of what electroencephalography (EEG) is:
|
||||
1. Electroencephalography (EEG) is a non-invasive neuroimaging technique that measures the electrical activity of the brain.
|
||||
2. The brain is made up of billions of neurons, which communicate with each other through electrical signals. EEG recordings measure these electrical signals, allowing researchers and clinicians to study the brain's activity.
|
||||
3. To record EEG data, electrodes are placed on the scalp, usually in a specific pattern such as the International 10-20 system. These electrodes detect the electrical activity of the brain and transmit it to a computer for analysis.
|
||||
4. The EEG signal is composed of different frequency bands, including alpha, beta, gamma, and theta waves. Each frequency band is associated with different cognitive processes, such as attention, relaxation, or memory.
|
||||
5. EEG can be used to diagnose and monitor a variety of neurological conditions, such as epilepsy, sleep disorders, and stroke. It can also be used to assess brain function in patients with traumatic brain injury, coma, or vegetative state.
|
||||
6. In addition to diagnostic applications, EEG is also used in research studies to investigate the neural mechanisms underlying various cognitive processes, such as language processing, memory formation, and decision-making.
|
||||
7. EEG has several advantages over other neuroimaging techniques, such as functional magnetic resonance imaging (fMRI) or positron emission tomography (PET). For example, EEG is relatively inexpensive, portable, and can be performed in a clinical setting or at home. Additionally, EEG provides high temporal resolution, allowing researchers to study the dynamics of brain activity in real-time.
|
||||
8. Overall, EEG is a valuable tool for understanding the workings of the human brain, diagnosing neurological conditions, and monitoring brain health. Its non-invasive nature and high temporal resolution make it an important technique in neuroscience research and clinical practice."
|
||||
```
|
||||
|
||||
### Step 3: Register the custom LLM
|
||||
In `nav_src/agent.py`, register the custom LLM by adding the following code after `line 176`:
|
||||
```python
|
||||
elif config.llm_model_name == 'your_custom_llm':
|
||||
from LLMs.Langchain_model import Custom_model
|
||||
self.llm = Custom_model.from_config(
|
||||
config = config,
|
||||
)
|
||||
```
|
||||
|
||||
### Step 4: Run NavGPT with the custom LLM
|
||||
Now you can run NavGPT with your custom LLM:
|
||||
```bash
|
||||
cd nav_src
|
||||
python NavGPT.py --llm_model_name your_custom_llm \
|
||||
--output_dir ../datasets/R2R/exprs/your_custom_llm-test
|
||||
```
|
||||
85
nav_src/LLMs/Langchain_llama.py
Normal file
85
nav_src/LLMs/Langchain_llama.py
Normal file
@ -0,0 +1,85 @@
|
||||
from typing import Any, List, Mapping, Optional
|
||||
|
||||
from langchain.callbacks.manager import CallbackManagerForLLMRun
|
||||
from langchain.llms.base import LLM
|
||||
from LLMs.llama.llama import Llama
|
||||
|
||||
class Custom_Llama(LLM):
|
||||
model: Any #: :meta private:
|
||||
|
||||
"""Key word arguments passed to the model."""
|
||||
ckpt_dir: str
|
||||
tokenizer_path: str
|
||||
temperature: float = 0.6
|
||||
top_p: float = 0.9
|
||||
max_seq_len: int = 128
|
||||
max_gen_len: int = 64
|
||||
max_batch_size: int = 4
|
||||
|
||||
@property
|
||||
def _llm_type(self) -> str:
|
||||
return "custom_llama"
|
||||
|
||||
@classmethod
|
||||
def from_model_id(
|
||||
cls,
|
||||
ckpt_dir: str,
|
||||
tokenizer_path: str,
|
||||
temperature: float = 0.6,
|
||||
top_p: float = 0.9,
|
||||
max_seq_len: int = 128,
|
||||
max_gen_len: int = 64,
|
||||
max_batch_size: int = 4,
|
||||
**kwargs: Any,
|
||||
) -> LLM:
|
||||
"""Construct the pipeline object from model_id and task."""
|
||||
|
||||
model = Llama.build(
|
||||
ckpt_dir=ckpt_dir,
|
||||
tokenizer_path=tokenizer_path,
|
||||
max_seq_len=max_seq_len,
|
||||
max_batch_size=max_batch_size,
|
||||
)
|
||||
|
||||
return cls(
|
||||
model = model,
|
||||
ckpt_dir = ckpt_dir,
|
||||
tokenizer_path = tokenizer_path,
|
||||
# set as default
|
||||
temperature = 0.6,
|
||||
top_p = top_p,
|
||||
max_seq_len = max_seq_len,
|
||||
max_gen_len = max_gen_len,
|
||||
max_batch_size = max_batch_size,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
def _call(
|
||||
self,
|
||||
prompt: str,
|
||||
stop: Optional[List[str]] = None,
|
||||
run_manager: Optional[CallbackManagerForLLMRun] = None,
|
||||
) -> str:
|
||||
# if stop is not None:
|
||||
# raise ValueError("stop kwargs are not permitted.")
|
||||
|
||||
result = self.model.text_completion(
|
||||
[prompt],
|
||||
max_gen_len=self.max_gen_len,
|
||||
temperature=self.temperature,
|
||||
top_p=self.top_p,
|
||||
)
|
||||
return result[0]["generation"]
|
||||
|
||||
@property
|
||||
def _identifying_params(self) -> Mapping[str, Any]:
|
||||
"""Get the identifying parameters."""
|
||||
return {
|
||||
"ckpt_dir": self.ckpt_dir,
|
||||
"tokenizer_path": self.tokenizer_path,
|
||||
"temperature": self.temperature,
|
||||
"top_p": self.top_p,
|
||||
"max_seq_len": self.max_seq_len,
|
||||
"max_gen_len": self.max_gen_len,
|
||||
"max_batch_size": self.max_batch_size,
|
||||
}
|
||||
1
nav_src/LLMs/llama
Submodule
1
nav_src/LLMs/llama
Submodule
@ -0,0 +1 @@
|
||||
Subproject commit 06faf3aab2971e7931e3d5b41e53c4a614d5bad7
|
||||
107
nav_src/NavGPT.py
Normal file
107
nav_src/NavGPT.py
Normal file
@ -0,0 +1,107 @@
|
||||
import os
|
||||
import json
|
||||
import time
|
||||
|
||||
from data_utils import construct_instrs
|
||||
from utils.logger import write_to_record_file
|
||||
|
||||
from utils.data import ImageObservationsDB
|
||||
from parser import parse_args
|
||||
from env import R2RNavBatch
|
||||
from agent import NavAgent
|
||||
|
||||
def build_dataset(args):
|
||||
|
||||
feat_db = ImageObservationsDB(args.obs_dir, args.obs_summary_dir, args.obj_dir)
|
||||
|
||||
dataset_class = R2RNavBatch
|
||||
|
||||
val_env_names = [args.val_env_name]
|
||||
|
||||
val_envs = {}
|
||||
for split in val_env_names:
|
||||
val_instr_data = construct_instrs(
|
||||
args.anno_dir, args.dataset, [split]
|
||||
)
|
||||
val_env = dataset_class(
|
||||
feat_db, val_instr_data, args.connectivity_dir, args.navigable_dir,
|
||||
batch_size=args.batch_size, seed=args.seed, name=split,
|
||||
) # evaluation using all objects
|
||||
val_envs[split] = val_env
|
||||
|
||||
return val_envs
|
||||
|
||||
|
||||
def valid(args, val_envs):
|
||||
|
||||
agent = NavAgent(next(iter(val_envs.values())), args)
|
||||
|
||||
with open(os.path.join(args.log_dir, 'validation_args.json'), 'w') as outf:
|
||||
json.dump(vars(args), outf, indent=4)
|
||||
record_file = os.path.join(args.log_dir, 'valid.txt')
|
||||
write_to_record_file(str(args) + '\n\n', record_file)
|
||||
|
||||
for env_name, env in val_envs.items():
|
||||
prefix = 'submit'
|
||||
if os.path.exists(os.path.join(args.pred_dir, "%s_%s.json" % (prefix, env_name))):
|
||||
continue
|
||||
agent.env = env
|
||||
|
||||
start_time = time.time()
|
||||
agent.test(iters=args.iters)
|
||||
print(env_name, 'cost time: %.2fs' % (time.time() - start_time))
|
||||
# Get the results
|
||||
preds = agent.get_results(detailed_output=False)
|
||||
# Record llm output details
|
||||
if args.detailed_output:
|
||||
preds_detail = agent.get_results(detailed_output=True)
|
||||
|
||||
json.dump(
|
||||
preds_detail,
|
||||
open(os.path.join(args.log_dir, "detail_%s.json" % (env_name)), 'w'),
|
||||
sort_keys=True, indent=4, separators=(',', ': ')
|
||||
)
|
||||
|
||||
if 'test' not in env_name:
|
||||
score_summary, _ = env.eval_metrics(preds)
|
||||
loss_str = "Env name: %s" % env_name
|
||||
for metric, val in score_summary.items():
|
||||
loss_str += ', %s: %.2f' % (metric, val)
|
||||
write_to_record_file(loss_str+'\n', record_file)
|
||||
|
||||
json.dump(
|
||||
preds,
|
||||
open(os.path.join(args.pred_dir, "%s_%s.json" % (prefix, env_name)), 'w'),
|
||||
sort_keys=True, indent=4, separators=(',', ': ')
|
||||
)
|
||||
|
||||
|
||||
def valid_from_file(args, val_envs):
|
||||
|
||||
agent = NavAgent(next(iter(val_envs.values())), args)
|
||||
with open(args.valid_file, 'r') as f:
|
||||
preds = json.load(f)
|
||||
|
||||
for env_name, env in val_envs.items():
|
||||
agent.env = env
|
||||
valid_list = [preds]
|
||||
for valid_pred in valid_list:
|
||||
score_summary, _ = env.eval_metrics(valid_pred)
|
||||
loss_str = "Env name: %s, length %d" % (env_name, len(valid_pred))
|
||||
for metric, val in score_summary.items():
|
||||
loss_str += ', %s: %.2f' % (metric, val)
|
||||
print(loss_str)
|
||||
|
||||
def main():
|
||||
args = parse_args()
|
||||
|
||||
val_envs = build_dataset(args)
|
||||
|
||||
if args.valid_file is not None:
|
||||
valid_from_file(args, val_envs)
|
||||
else:
|
||||
valid(args, val_envs)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
||||
728
nav_src/agent.py
Normal file
728
nav_src/agent.py
Normal file
@ -0,0 +1,728 @@
|
||||
"""Agent that interacts with Matterport3D simulator via a hierarchical planning approach."""
|
||||
import json
|
||||
import yaml
|
||||
import re
|
||||
import warnings
|
||||
import numpy as np
|
||||
from typing import Any, Callable, List, NamedTuple, Optional, Sequence, Tuple, Dict, Union
|
||||
|
||||
from env import R2RNavBatch
|
||||
from argparse import Namespace
|
||||
from agent_base import BaseAgent
|
||||
|
||||
from langchain import HuggingFacePipeline
|
||||
from langchain.agents.agent import AgentExecutor, AgentAction, AgentOutputParser
|
||||
from langchain.agents.mrkl.base import ZeroShotAgent
|
||||
from langchain.agents.tools import Tool
|
||||
from langchain.chains import LLMChain
|
||||
from langchain.llms.openai import OpenAI
|
||||
from langchain.prompts import PromptTemplate
|
||||
from langchain.schema import (
|
||||
AgentAction,
|
||||
AgentFinish,
|
||||
BaseMessage,
|
||||
BaseOutputParser,
|
||||
OutputParserException
|
||||
)
|
||||
from langchain.base_language import BaseLanguageModel
|
||||
|
||||
from langchain.agents.mrkl.prompt import FORMAT_INSTRUCTIONS
|
||||
from prompt.planner_prompt import (
|
||||
ACTION_PROMPT,
|
||||
HISTORY_PROMPT,
|
||||
PLANNER_PROMPT,
|
||||
BACK_TRACE_PROMPT,
|
||||
MAKE_ACTION_TOOL_NAME,
|
||||
MAKE_ACTION_TOOL_DESCRIPTION,
|
||||
BACK_TRACE_TOOL_NAME,
|
||||
BACK_TRACE_TOOL_DESCRIPTION,
|
||||
VLN_ORCHESTRATOR_PROMPT,
|
||||
VLN_GPT4_PROMPT,
|
||||
VLN_GPT35_PROMPT,
|
||||
)
|
||||
|
||||
FINAL_ANSWER_ACTION = "Final Answer:"
|
||||
EXCEPTION_TOOL_NAME = "_Exception"
|
||||
MAX_SCRATCHPAD_LENGTH = 7000
|
||||
|
||||
MISSING_ACTION_AFTER_THOUGHT_ERROR_MESSAGE = (
|
||||
"Invalid Format: Missing 'Action:' after 'Thought:"
|
||||
)
|
||||
MISSING_ACTION_INPUT_AFTER_ACTION_ERROR_MESSAGE = (
|
||||
"Invalid Format: Missing 'Action Input:' after 'Action:'"
|
||||
)
|
||||
FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE = (
|
||||
"Parsing LLM output produced both a final answer and a parse-able action:"
|
||||
)
|
||||
|
||||
|
||||
class NavGPTOutputParser(AgentOutputParser):
|
||||
"""MRKL Output parser for the chat agent."""
|
||||
|
||||
def get_format_instructions(self) -> str:
|
||||
return FORMAT_INSTRUCTIONS
|
||||
|
||||
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
|
||||
includes_answer = FINAL_ANSWER_ACTION in text
|
||||
regex = (
|
||||
r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*\"?([a-fA-F0-9]{32})\"?"
|
||||
)
|
||||
action_match = re.search(regex, text, re.DOTALL)
|
||||
if action_match:
|
||||
if includes_answer:
|
||||
raise OutputParserException(
|
||||
f"{FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE}: {text}"
|
||||
)
|
||||
action = action_match.group(1).strip()
|
||||
action_input = action_match.group(2)
|
||||
tool_input = action_input.strip(" ")
|
||||
# ensure if its a well formed SQL query we don't remove any trailing " chars
|
||||
if tool_input.startswith("SELECT ") is False:
|
||||
tool_input = tool_input.strip('"')
|
||||
|
||||
return AgentAction(action, tool_input, text)
|
||||
|
||||
elif includes_answer:
|
||||
return AgentFinish(
|
||||
{"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
|
||||
)
|
||||
|
||||
if not re.search(r"Action\s*\d*\s*:[\s]*(.*?)", text, re.DOTALL):
|
||||
raise OutputParserException(
|
||||
f"Could not parse LLM output: `{text}`",
|
||||
observation=MISSING_ACTION_AFTER_THOUGHT_ERROR_MESSAGE,
|
||||
llm_output=text,
|
||||
send_to_llm=True,
|
||||
)
|
||||
elif not re.search(
|
||||
r"[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)", text, re.DOTALL
|
||||
):
|
||||
raise OutputParserException(
|
||||
f"Could not parse LLM output: `{text}`",
|
||||
observation=MISSING_ACTION_INPUT_AFTER_ACTION_ERROR_MESSAGE,
|
||||
llm_output=text,
|
||||
send_to_llm=True,
|
||||
)
|
||||
else:
|
||||
raise OutputParserException(f"Could not parse LLM output: `{text}`")
|
||||
|
||||
@property
|
||||
def _type(self) -> str:
|
||||
return "mrkl-NavGPT"
|
||||
|
||||
class VLNAgent(ZeroShotAgent):
|
||||
|
||||
history: Optional[List[str]] = None
|
||||
|
||||
def _construct_scratchpad(
|
||||
self, intermediate_steps: List[Tuple[AgentAction, str]]
|
||||
) -> Union[str, List[BaseMessage]]:
|
||||
"""Construct the scratchpad that lets the agent continue its thought process."""
|
||||
thoughts = ""
|
||||
nav_step = 1
|
||||
for i, (action, observation) in enumerate(intermediate_steps):
|
||||
thoughts += action.log
|
||||
if (i == len(intermediate_steps) - 1) or (action.tool != MAKE_ACTION_TOOL_NAME):
|
||||
thoughts += f"\n{self.observation_prefix}{observation}\n{self.llm_prefix}"
|
||||
else:
|
||||
thoughts += f"\n{self.observation_prefix}{self.history[nav_step]}\n{self.llm_prefix}"
|
||||
nav_step += 1
|
||||
return thoughts
|
||||
|
||||
def get_full_inputs(
|
||||
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
|
||||
) -> Dict[str, Any]:
|
||||
"""Create the full inputs for the LLMChain from intermediate steps."""
|
||||
thoughts = self._construct_scratchpad(intermediate_steps)[-MAX_SCRATCHPAD_LENGTH:]
|
||||
new_inputs = {"agent_scratchpad": thoughts, "stop": self._stop}
|
||||
if len(intermediate_steps) == 0:
|
||||
full_inputs = {**kwargs, **new_inputs}
|
||||
else:
|
||||
kwargs["init_observation"] = self.history[0]
|
||||
full_inputs = {**kwargs, **new_inputs}
|
||||
return full_inputs
|
||||
|
||||
class NavAgent(BaseAgent):
|
||||
def __init__(
|
||||
self,
|
||||
env: R2RNavBatch,
|
||||
config: Namespace):
|
||||
"""
|
||||
Initialize the LLM Navigation Agent.
|
||||
|
||||
Args:
|
||||
env: The Matterport3D environment.
|
||||
config: The configuration.
|
||||
"""
|
||||
super().__init__(env)
|
||||
self.config = config
|
||||
|
||||
if config.llm_model_name.split('-')[0] == 'gpt':
|
||||
self.llm = OpenAI(
|
||||
temperature=config.temperature,
|
||||
model_name=config.llm_model_name,
|
||||
)
|
||||
elif config.llm_model_name == 'llama-2-13b':
|
||||
from LLMs.Langchain_llama import Custom_Llama
|
||||
ckpt_dir = "LLMs/llama/llama-2-13b"
|
||||
tokenizer_path = "LLMs/llama/tokenizer.model"
|
||||
self.llm = Custom_Llama.from_model_id(
|
||||
temperature=config.temperature,
|
||||
ckpt_dir = ckpt_dir,
|
||||
tokenizer_path = tokenizer_path,
|
||||
max_seq_len = 8000,
|
||||
max_gen_len = 500,
|
||||
max_batch_size = 1,
|
||||
)
|
||||
# elif config.llm_model_name == 'Vicuna-v1.5-13b':
|
||||
# from LLMs.Langchain_Vicuna import Custom_Vicuna
|
||||
# self.llm = Custom_Vicuna.from_config(
|
||||
# config = config,
|
||||
# )
|
||||
# elif config.llm_model_name == 'FlanT5XXL':
|
||||
# from LLMs.Langchain_FlanT5 import Custom_FlanT5
|
||||
# self.llm = Custom_FlanT5.from_config(
|
||||
# config = config,
|
||||
# )
|
||||
# elif config.llm_model_name == 'Emu-14B':
|
||||
# from LLMs.Langchain_Emu import Custom_Emu
|
||||
# self.llm = Custom_Emu.from_config(
|
||||
# config = config,
|
||||
# )
|
||||
# else:
|
||||
# from LLMs.Langchain_InstructBLIP import Custom_NavGPT_InstructBLIP
|
||||
# self.llm = Custom_NavGPT.from_config(
|
||||
# config = config,
|
||||
# )
|
||||
|
||||
self.output_parser = NavGPTOutputParser()
|
||||
self.agent_executor = self.create_vln_agent()
|
||||
|
||||
plan_prompt = PromptTemplate(
|
||||
template=PLANNER_PROMPT,
|
||||
input_variables=["instruction"],
|
||||
)
|
||||
self.plan_chain = LLMChain(llm=self.llm, prompt=plan_prompt)
|
||||
|
||||
def parse_action(self, llm_output: str) -> Tuple[str, str]:
|
||||
regex = r"(.*?)Final Answer:[\s]*(.*)"
|
||||
match = re.search(regex, llm_output, re.DOTALL)
|
||||
if not match:
|
||||
raise ValueError(f"Could not parse LLM output: `{llm_output}`")
|
||||
|
||||
thought = match.group(1).strip()
|
||||
action = match.group(2).strip(" ").strip('"').strip("'")
|
||||
|
||||
return thought, action
|
||||
|
||||
def get_his_viewpoints(self) -> str:
|
||||
'''Return the history of visited viewpoints for back tracing.'''
|
||||
his_viewpoints = ''
|
||||
# The last vp is not included in the history
|
||||
for i, detail in enumerate(self.traj[0]['details'][:-1]):
|
||||
viewpointID = detail['viewpointID']
|
||||
viewpoint_ob = detail['feature']
|
||||
his_viewpoints += f"Step {i+1}. Viewpoint ID '{viewpointID}':\n {viewpoint_ob}\n\n"
|
||||
return his_viewpoints
|
||||
|
||||
def get_history(self, obs: dict, angle: str) -> str:
|
||||
'''Return the history of actions taken.'''
|
||||
history = f'{angle}\nCurrent viewpoint "{obs["viewpoint"]}": Scene from the viewpoint is a {obs["obs_summary"]}'
|
||||
return history
|
||||
|
||||
def get_navigable_str(self, cur_heading: float, cur_elevation: float, navigable: dict) -> str:
|
||||
'''Return the navigable viewpoints as a string.'''
|
||||
navigable_str = ''
|
||||
|
||||
for vp, items in navigable.items():
|
||||
heading = np.rad2deg(items['heading'])
|
||||
elevation = np.rad2deg(items['elevation'])
|
||||
distance = items['distance']
|
||||
rel_heading = heading - cur_heading
|
||||
rel_elevation = elevation - cur_elevation
|
||||
|
||||
if self.config.use_relative_angle:
|
||||
navigable_str += f"'{vp}':\nheading: {rel_heading:.2f}, elevation: {rel_elevation:.2f}, distance: {distance:.2f}\n"
|
||||
else:
|
||||
navigable_str += f"'{vp}':\nheading: {heading:.2f}, elevation: {elevation:.2f}, distance: {distance:.2f}\n"
|
||||
|
||||
return navigable_str
|
||||
|
||||
def modify_heading_angles(self, heading_angle, observation_list, candidate_dict, object_list):
|
||||
# Function to normalize an angle to the range of -180 to 180
|
||||
def normalize_angle(angle):
|
||||
while angle > 180:
|
||||
angle -= 360
|
||||
while angle <= -180:
|
||||
angle += 360
|
||||
return angle
|
||||
|
||||
def angle_to_left_right(angle):
|
||||
return f"left {-angle:.2f}" if angle < 0 else f"right {angle:.2f}"
|
||||
|
||||
# Define the directions
|
||||
directions = ['Front', 'Front Right', 'Right', 'Rear Right', 'Rear', 'Rear Left', 'Left', 'Front Left']
|
||||
|
||||
# Calculate the range of heading angles belonging to each direction
|
||||
range_idx = int((heading_angle - 22.5) // 45) + 1
|
||||
obs_idx = [(i + range_idx) % 8 for i in range(8)]
|
||||
|
||||
# Initialize a dictionary to store the candidate viewpoints for each direction
|
||||
candidate_range = {}
|
||||
if not self.config.use_navigable:
|
||||
for viewpoint_id, viewpoint_data in candidate_dict.items():
|
||||
viewpoint_heading = np.rad2deg(viewpoint_data['heading'])
|
||||
vp_range_idx = int((viewpoint_heading - 22.5) // 45) + 1
|
||||
rel_viewpoint_heading = viewpoint_heading - heading_angle
|
||||
rel_viewpoint_heading = normalize_angle(rel_viewpoint_heading)
|
||||
rel_viewpoint_heading = angle_to_left_right(rel_viewpoint_heading)
|
||||
vp_description = rel_viewpoint_heading + f', {viewpoint_data["distance"]:.2f}m'
|
||||
# rel_range_idx = (vp_range_idx - range_idx) % 8
|
||||
candidate_range.setdefault(vp_range_idx, {}).update({viewpoint_id: vp_description})
|
||||
|
||||
# Calculate the relative angle ranges based on the heading angle
|
||||
angle_ranges = [(angle - 22.5 - heading_angle, angle + 22.5 - heading_angle) for angle in range(0, 360, 45)]
|
||||
|
||||
# Initialize an empty list to store the formatted strings
|
||||
formatted_strings = []
|
||||
|
||||
# Iterate through the directions, angle ranges, and observation strings
|
||||
for direction, idx in zip(directions, obs_idx):
|
||||
# Calculate the relative angles and normalize them
|
||||
rel_angle1 = normalize_angle(angle_ranges[idx][0])
|
||||
rel_angle2 = normalize_angle(angle_ranges[idx][1])
|
||||
|
||||
# Convert the angles to "left n" or "right n"
|
||||
left_right1 = angle_to_left_right(rel_angle1)
|
||||
left_right2 = angle_to_left_right(rel_angle2)
|
||||
|
||||
# Create the formatted string
|
||||
formatted_string = f"{direction}, range ({left_right1} to {left_right2}): \n'{observation_list[idx]}'"
|
||||
|
||||
# Add the objects to the formatted string
|
||||
object_dict = {}
|
||||
if len(object_list[idx]) > 0:
|
||||
object = object_list[idx]
|
||||
for obj, obj_data in object.items():
|
||||
rel_obj_heading = obj_data['heading'] - heading_angle
|
||||
rel_obj_heading = normalize_angle(rel_obj_heading)
|
||||
rel_obj_heading = angle_to_left_right(rel_obj_heading)
|
||||
object_dict[obj] = f'{rel_obj_heading}, {obj_data["distance"]:.2f}m'
|
||||
formatted_string += f'\n{direction} Objects in 3m: {object_dict}'
|
||||
else:
|
||||
formatted_string += f'\n{direction} Objects in 3m: None'
|
||||
|
||||
# Add the candidate viewpoints to the formatted string
|
||||
if candidate_range.get(idx):
|
||||
formatted_string += f'\n{direction} Navigable Viewpoints:{candidate_range[idx]}'
|
||||
else:
|
||||
formatted_string += f'\n{direction} Navigable Viewpoints: None'
|
||||
|
||||
# Add the formatted string to the list
|
||||
formatted_strings.append(formatted_string)
|
||||
|
||||
# Join the formatted strings into a single output string
|
||||
output_string = '\n'.join(formatted_strings)
|
||||
|
||||
return output_string
|
||||
|
||||
def init_trajecotry(self, obs: List[dict]):
|
||||
"""Initialize the trajectory with the given observation."""
|
||||
# Record the navigation path
|
||||
self.traj = [{
|
||||
'instr_id': ob['instr_id'],
|
||||
'path': [[ob['viewpoint']]],
|
||||
'details': [],
|
||||
} for ob in obs]
|
||||
# Record the history of actions taken
|
||||
self.agent_executor.agent.history = [f'Navigation start, no actions taken yet.\nCurrent viewpoint "{obs[0]["viewpoint"]}": Scene from the viewpoint is a {obs[0]["obs_summary"]}']
|
||||
|
||||
def _create_make_action_tool(
|
||||
self,
|
||||
llm: BaseLanguageModel,
|
||||
) -> Tool:
|
||||
"""Create a tool to make single action prediction in MP3D.
|
||||
|
||||
The tool is invoked with the simulation environment and records the
|
||||
action taken by the agent.
|
||||
The tool interacts with the environment to obtain the current observation,
|
||||
uses the LLM to predict the next action, and to summarize the previous trajectory
|
||||
into history.
|
||||
"""
|
||||
|
||||
action_prompt = PromptTemplate(
|
||||
template=ACTION_PROMPT,
|
||||
input_variables=["action_plan", "observation", "history", "navigable_viewpoints"],
|
||||
)
|
||||
history_prompt = PromptTemplate(
|
||||
template=HISTORY_PROMPT,
|
||||
input_variables=["history", "previous_action", "observation"],
|
||||
)
|
||||
self.action_chain = LLMChain(llm=llm, prompt=action_prompt)
|
||||
self.history_chain = LLMChain(llm=llm, prompt=history_prompt)
|
||||
|
||||
def _make_action(*args, **kwargs) -> str:
|
||||
'''Make single step action in MatterSim.'''
|
||||
# Get current observation
|
||||
cur_obs = self.env._get_obs()[0]
|
||||
|
||||
# Get current feature
|
||||
feature = cur_obs['obs']
|
||||
heading = np.rad2deg(cur_obs['heading'])
|
||||
elevation = np.rad2deg(cur_obs['elevation'])
|
||||
objects = cur_obs['objects']
|
||||
orientation = f'\nheading: {heading:.2f}, elevation: {elevation:.2f}'
|
||||
navigable = cur_obs['candidate']
|
||||
if self.config.use_relative_angle:
|
||||
feature = self.modify_heading_angles(heading, feature, navigable, objects)
|
||||
if self.config.use_navigable:
|
||||
navigable = self.get_navigable_str(heading, elevation, navigable)
|
||||
|
||||
if self.config.use_tool_chain:
|
||||
# Get current action plan
|
||||
action_plan = self.cur_action_plan
|
||||
# Single step action
|
||||
LLM_action_output = self.action_chain.run(
|
||||
action_plan = action_plan,
|
||||
observation = feature,
|
||||
history = self.agent_executor.agent.history[-1],
|
||||
navigable_viewpoints = navigable
|
||||
)
|
||||
# Parse LLM output, action is the next viewpoint ID
|
||||
thought, action = self.parse_action(LLM_action_output)
|
||||
else:
|
||||
action = args[0].strip(" ").strip('"').strip("'")
|
||||
|
||||
# Make the action in Simulator
|
||||
if action not in self.env.env.sims[0].navigable_dict.keys():
|
||||
# Update history
|
||||
history = f'ViewpointID "{action}" is not valid, no action taken for the agent.'
|
||||
self.agent_executor.agent.history.append(history)
|
||||
if self.config.use_navigable:
|
||||
return f"\nViewpointID '{action}' is not valid, agent not moved. DO NOT fabricate nonexistent IDs. The navigable viewpoints you can choose from current viewpoints are: {[key for key in navigable.keys()]}.\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
|
||||
else:
|
||||
return f"\nViewpointID '{action}' is not valid, agent not moved. DO NOT fabricate nonexistent IDs. The navigable viewpoints you can choose from current viewpoints are: {[key for key in navigable.keys()]}.\n\tCurrent Viewpoint:\n{feature}"
|
||||
else:
|
||||
turned_angle, new_obs = self.make_equiv_action([action])
|
||||
|
||||
# Update the current feature
|
||||
new_feature = new_obs['obs']
|
||||
new_feature_sum = new_obs['obs_summary']
|
||||
new_navigable = new_obs['candidate']
|
||||
new_objects = new_obs['objects']
|
||||
new_heading = np.rad2deg(new_obs['heading'])
|
||||
new_elevation = np.rad2deg(new_obs['elevation'])
|
||||
if self.config.use_relative_angle:
|
||||
new_feature = self.modify_heading_angles(new_heading, new_feature, new_navigable, new_objects)
|
||||
new_orientation = f'\nheading: {new_heading:.2f}, elevation: {new_elevation:.2f}'
|
||||
if self.config.use_navigable:
|
||||
new_navigable = self.get_navigable_str(new_heading, new_elevation, new_navigable)
|
||||
|
||||
# Update history
|
||||
if self.config.use_history_chain:
|
||||
history = self.history_chain.run(
|
||||
observation = new_feature_sum,
|
||||
history = self.agent_executor.agent.history[-1],
|
||||
previous_action = turned_angle
|
||||
)
|
||||
else:
|
||||
history = self.get_history(new_obs, turned_angle)
|
||||
|
||||
self.agent_executor.agent.history.append(history)
|
||||
# Record single step detail
|
||||
if self.config.use_tool_chain:
|
||||
detail = {
|
||||
"viewpointID": action,
|
||||
"turned_angle": turned_angle,
|
||||
"acion_maker_thought": thought,
|
||||
"feature": new_feature,
|
||||
"history": self.agent_executor.agent.history[-1],
|
||||
}
|
||||
else:
|
||||
detail = {
|
||||
"viewpointID": action,
|
||||
"turned_angle": turned_angle,
|
||||
"feature": new_feature,
|
||||
"history": self.agent_executor.agent.history[-1],
|
||||
}
|
||||
self.traj[0]['details'].append(detail)
|
||||
# Return LLM chain output as the observation of tool
|
||||
if self.config.use_tool_chain:
|
||||
return f"\n\tAction_maker Thought:\n{thought}\n\tAction_maker Action:\n{turned_angle}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
|
||||
elif self.config.use_relative_angle:
|
||||
if self.config.use_navigable:
|
||||
return f"\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
|
||||
else:
|
||||
return f'\nCurrent Viewpoint "{action}":\n{new_feature}'
|
||||
else:
|
||||
if self.config.use_navigable:
|
||||
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
|
||||
else:
|
||||
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}"
|
||||
|
||||
|
||||
return Tool(
|
||||
name=MAKE_ACTION_TOOL_NAME,
|
||||
func=_make_action,
|
||||
description=MAKE_ACTION_TOOL_DESCRIPTION,
|
||||
)
|
||||
|
||||
def _create_back_trace_tool(
|
||||
self,
|
||||
llm: BaseLanguageModel,
|
||||
) -> Tool:
|
||||
"""Create a tool to back trace during navigation.
|
||||
|
||||
The tool is invoked with the history of navigation trajectory.
|
||||
Using the LLM to find a viewpoint on the trajectory to back trace to.
|
||||
"""
|
||||
prompt = PromptTemplate(
|
||||
template=BACK_TRACE_PROMPT,
|
||||
input_variables=["action_plan", "history", "observation"],
|
||||
)
|
||||
|
||||
chain = LLMChain(llm=llm, prompt=prompt)
|
||||
|
||||
def _back_trace(*args, **kwargs) -> str:
|
||||
'''Back trace the action plan.'''
|
||||
cur_obs = self.env._get_obs()[0]
|
||||
|
||||
# Get current feature
|
||||
feature = cur_obs['obs']
|
||||
navigable = cur_obs['candidate']
|
||||
objects = cur_obs['objects']
|
||||
heading = np.rad2deg(cur_obs['heading'])
|
||||
elevation = np.rad2deg(cur_obs['elevation'])
|
||||
orientation = f'\nheading: {heading:.2f}, elevation: {elevation:.2f}'
|
||||
if self.config.use_relative_angle:
|
||||
feature = self.modify_heading_angles(heading, feature, navigable, objects)
|
||||
if self.config.use_navigable:
|
||||
navigable = self.get_navigable_str(heading, elevation, navigable)
|
||||
|
||||
if self.config.use_tool_chain:
|
||||
# Get current action plan
|
||||
action_plan = self.cur_action_plan
|
||||
# Get all previous viewpoints observation
|
||||
previous_vp = self.get_his_viewpoints()
|
||||
# Back trace
|
||||
LLM_output = chain.run(action_plan = action_plan, observation = previous_vp, history = self.agent_executor.agent.history[-1])
|
||||
# Parse LLM output, action is the next viewpoint ID
|
||||
thought, action = self.parse_action(LLM_output)
|
||||
else:
|
||||
action = args[0].strip(" ").strip('"').strip("'")
|
||||
|
||||
# Make the action in Simulator
|
||||
if action not in self.env.env.sims[0].navigable_dict.keys():
|
||||
if self.config.use_navigable:
|
||||
return f"\nViewpointID '{action}' is not valid. DO NOT fabricate nonexistent IDs.\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
|
||||
else:
|
||||
return f"\nViewpointID '{action}' is not valid. DO NOT fabricate nonexistent IDs.\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}"
|
||||
else:
|
||||
_, new_obs = self.make_equiv_action([action])
|
||||
|
||||
# Update the current feature
|
||||
new_feature = new_obs['obs']
|
||||
new_navigable = new_obs['candidate']
|
||||
new_objects = new_obs['objects']
|
||||
new_heading = np.rad2deg(new_obs['heading'])
|
||||
new_elevation = np.rad2deg(new_obs['elevation'])
|
||||
new_orientation = f'\nheading: {new_heading:.2f}, elevation: {new_elevation:.2f}'
|
||||
if self.config.use_relative_angle:
|
||||
new_feature = self.modify_heading_angles(new_heading, new_feature, new_navigable, new_objects)
|
||||
if self.config.use_navigable:
|
||||
new_navigable = self.get_navigable_str(new_heading, new_elevation, new_navigable)
|
||||
|
||||
# Update history
|
||||
history = self.get_history(new_obs, 'Seems going in a wrong way, back trace to a previous point.')
|
||||
self.agent_executor.agent.history.append(history)
|
||||
# Record single step detail
|
||||
if self.config.use_tool_chain:
|
||||
return f"\tBack_tracer Thought:\n{thought}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
|
||||
elif self.config.use_relative_angle:
|
||||
if self.config.use_navigable:
|
||||
return f"\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
|
||||
else:
|
||||
return f"\nCurrent Viewpoint:{action}\n{new_feature}"
|
||||
else:
|
||||
if self.config.use_navigable:
|
||||
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
|
||||
else:
|
||||
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}"
|
||||
|
||||
return Tool(
|
||||
name=BACK_TRACE_TOOL_NAME,
|
||||
func=_back_trace,
|
||||
description=BACK_TRACE_TOOL_DESCRIPTION,
|
||||
)
|
||||
|
||||
def create_vln_agent(
|
||||
self,
|
||||
) -> AgentExecutor:
|
||||
"""Instantiate API planner and controller for a given trajectory.
|
||||
|
||||
We use a top-level "orchestrator" agent to invoke the planner and controller,
|
||||
rather than a top-level planner
|
||||
that invokes a controller with its plan. This is to keep the planner simple.
|
||||
"""
|
||||
|
||||
self.action_maker = self._create_make_action_tool(self.llm)
|
||||
self.back_tracer = self._create_back_trace_tool(self.llm)
|
||||
|
||||
tools = [
|
||||
self.action_maker,
|
||||
self.back_tracer
|
||||
]
|
||||
|
||||
if self.config.use_tool_chain:
|
||||
prompt = PromptTemplate(
|
||||
template=VLN_ORCHESTRATOR_PROMPT,
|
||||
input_variables=["action_plan", "init_observation", "observation", "agent_scratchpad"],
|
||||
partial_variables={
|
||||
"tool_names": ", ".join([tool.name for tool in tools]),
|
||||
"tool_descriptions": "\n".join(
|
||||
[f"{tool.name}: {tool.description}" for tool in tools]
|
||||
),
|
||||
},
|
||||
)
|
||||
elif self.config.use_single_action:
|
||||
tools = [self.action_maker]
|
||||
prompt = PromptTemplate(
|
||||
template=VLN_GPT4_PROMPT if self.config.llm_model_name == 'gpt-4' else VLN_GPT35_PROMPT,
|
||||
input_variables=["action_plan", "init_observation", "agent_scratchpad"],
|
||||
partial_variables={
|
||||
"tool_names": ", ".join([tool.name for tool in tools]),
|
||||
"tool_descriptions": "\n".join(
|
||||
[f"{tool.name}: {tool.description}" for tool in tools]
|
||||
),
|
||||
},
|
||||
)
|
||||
else:
|
||||
prompt = PromptTemplate(
|
||||
template=VLN_ORCHESTRATOR_PROMPT,
|
||||
input_variables=["action_plan", "init_observation", "agent_scratchpad"],
|
||||
partial_variables={
|
||||
"tool_names": ", ".join([tool.name for tool in tools]),
|
||||
"tool_descriptions": "\n".join(
|
||||
[f"{tool.name}: {tool.description}" for tool in tools]
|
||||
),
|
||||
},
|
||||
)
|
||||
agent = VLNAgent(
|
||||
llm_chain=LLMChain(llm=self.llm, prompt=prompt),
|
||||
allowed_tools=[tool.name for tool in tools],
|
||||
output_parser = self.output_parser
|
||||
)
|
||||
return AgentExecutor.from_agent_and_tools(
|
||||
agent=agent,
|
||||
tools=tools,
|
||||
verbose=True,
|
||||
handle_parsing_errors = True,
|
||||
return_intermediate_steps=True,
|
||||
max_iterations=self.config.max_iterations,
|
||||
)
|
||||
|
||||
def make_equiv_action(self, actions: List[str]) -> str:
|
||||
"""
|
||||
Interface between Panoramic view and Egocentric view
|
||||
Take in the next viewpoint ID and move the agent to that viewpoint
|
||||
return the turned angle and new observation
|
||||
"""
|
||||
def normalize_angle(angle):
|
||||
while angle > 180:
|
||||
angle -= 360
|
||||
while angle <= -180:
|
||||
angle += 360
|
||||
return angle
|
||||
|
||||
def angle_to_left_right(angle):
|
||||
return f"left {-angle:.2f}" if angle < 0 else f"right {angle:.2f}"
|
||||
|
||||
# Get current agent facing angle
|
||||
cur_obs = self.env._get_obs()[0]
|
||||
cur_heading = np.rad2deg(cur_obs['heading'])
|
||||
# Make the action
|
||||
new_obs = self.env.step(actions)[0]
|
||||
new_heading = np.rad2deg(new_obs['heading'])
|
||||
# Record the trajectory
|
||||
self.traj[0]['path'].append(self.env.env.sims[0].gmap.bfs_shortest_path(cur_obs['viewpoint'], actions[0])[1:])
|
||||
# Calculate the turned angle
|
||||
turned_angle = new_heading - cur_heading
|
||||
# Generate action description
|
||||
cur_heading = angle_to_left_right(normalize_angle(cur_heading))
|
||||
new_heading = angle_to_left_right(normalize_angle(new_heading))
|
||||
action_description = f'Turn heading direction {turned_angle:.2f} degrees from {cur_heading} to {new_heading}.'
|
||||
return action_description, new_obs
|
||||
|
||||
def rollout(self, reset=True):
|
||||
if reset: # Reset env
|
||||
obs = self.env.reset()
|
||||
else:
|
||||
obs = self.env._get_obs()
|
||||
|
||||
# Initialize the trajectory
|
||||
self.init_trajecotry(obs)
|
||||
|
||||
# Load the instruction
|
||||
instructions = [ob['instruction'] for ob in obs]
|
||||
if self.config.load_instruction:
|
||||
action_plans = instructions
|
||||
elif self.config.load_action_plan:
|
||||
action_plans = [ob['action_plan'] for ob in obs]
|
||||
else:
|
||||
action_plans = []
|
||||
for instruction in instructions:
|
||||
action_plan = self.plan_chain.run(instruction = instruction)
|
||||
action_plans.append(action_plan)
|
||||
|
||||
for i, init_ob in enumerate(obs):
|
||||
self.cur_action_plan = action_plans[i]
|
||||
# Take the first action
|
||||
if self.config.use_tool_chain:
|
||||
first_obs = self.action_maker('')
|
||||
input = {
|
||||
'action_plan': self.cur_action_plan,
|
||||
'init_observation': init_ob['obs_summary'],
|
||||
'observation': first_obs,
|
||||
}
|
||||
else:
|
||||
# Get current feature
|
||||
feature = init_ob['obs']
|
||||
navigable = init_ob['candidate']
|
||||
objects = init_ob['objects']
|
||||
heading = np.rad2deg(init_ob['heading'])
|
||||
elevation = np.rad2deg(init_ob['elevation'])
|
||||
orientation = f'\nheading: {heading:.2f}, elevation: {elevation:.2f}'
|
||||
if self.config.use_relative_angle:
|
||||
feature = self.modify_heading_angles(heading, feature, navigable, objects)
|
||||
if self.config.use_navigable:
|
||||
navigable = self.get_navigable_str(heading, elevation, navigable)
|
||||
|
||||
if self.config.use_relative_angle:
|
||||
if self.config.use_navigable:
|
||||
init_observation = f"\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
|
||||
else:
|
||||
init_observation = f"\n\tCurrent Viewpoint:\n{feature}"
|
||||
else:
|
||||
if self.config.use_navigable:
|
||||
init_observation = f"\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
|
||||
else:
|
||||
init_observation = f"\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}"
|
||||
|
||||
input = {
|
||||
'action_plan': self.cur_action_plan,
|
||||
'init_observation': init_observation,
|
||||
}
|
||||
output = self.agent_executor(input)
|
||||
|
||||
self.traj[i]['llm_output'] = output['output']
|
||||
self.traj[i]['action_plan'] = output['action_plan']
|
||||
# extract agent's thought from llm output
|
||||
intermediate_steps = output['intermediate_steps']
|
||||
self.traj[i]['llm_thought'] = []
|
||||
self.traj[i]['llm_observation'] = []
|
||||
for action, observation in intermediate_steps:
|
||||
thought = action.log
|
||||
self.traj[i]['llm_thought'].append(thought)
|
||||
self.traj[i]['llm_observation'].append(observation)
|
||||
|
||||
return self.traj
|
||||
65
nav_src/agent_base.py
Normal file
65
nav_src/agent_base.py
Normal file
@ -0,0 +1,65 @@
|
||||
import json
|
||||
import os
|
||||
|
||||
class BaseAgent(object):
|
||||
''' Base class for an REVERIE agent to generate and save trajectories. '''
|
||||
|
||||
def __init__(self, env):
|
||||
self.env = env
|
||||
self.results = {}
|
||||
|
||||
def get_results(self, detailed_output=False):
|
||||
output = []
|
||||
for k, v in self.results.items():
|
||||
output.append({'instr_id': k, 'trajectory': v['path']})
|
||||
if detailed_output:
|
||||
output[-1]['details'] = v['details']
|
||||
output[-1]['action_plan'] = v['action_plan']
|
||||
output[-1]['llm_output'] = v['llm_output']
|
||||
output[-1]['llm_thought'] = v['llm_thought']
|
||||
output[-1]['llm_observation'] = v['llm_observation']
|
||||
return output
|
||||
|
||||
def rollout(self, **args):
|
||||
''' Return a list of dicts containing instr_id:'xx', path:[(viewpointId, heading_rad, elevation_rad)] '''
|
||||
raise NotImplementedError
|
||||
|
||||
@staticmethod
|
||||
def get_agent(name):
|
||||
return globals()[name+"Agent"]
|
||||
|
||||
def test(self, iters=None, **kwargs):
|
||||
# self.env.reset_epoch(shuffle=(iters is not None)) # If iters is not none, shuffle the env batch
|
||||
self.losses = []
|
||||
self.results = {}
|
||||
# We rely on env showing the entire batch before repeating anything
|
||||
looped = False
|
||||
self.loss = 0
|
||||
if iters is not None:
|
||||
# For each time, it will run the first 'iters' iterations. (It was shuffled before)
|
||||
for i in range(iters):
|
||||
for traj in self.rollout(**kwargs):
|
||||
self.loss = 0
|
||||
self.results[traj['instr_id']] = traj
|
||||
preds_detail = self.get_results(detailed_output=True)
|
||||
json.dump(
|
||||
preds_detail,
|
||||
open(os.path.join(self.config.log_dir, 'runtime.json'), 'w'),
|
||||
sort_keys=True, indent=4, separators=(',', ': ')
|
||||
)
|
||||
else: # Do a full round
|
||||
while True:
|
||||
for traj in self.rollout(**kwargs):
|
||||
if traj['instr_id'] in self.results:
|
||||
looped = True
|
||||
else:
|
||||
self.loss = 0
|
||||
self.results[traj['instr_id']] = traj
|
||||
preds_detail = self.get_results(detailed_output=True)
|
||||
json.dump(
|
||||
preds_detail,
|
||||
open(os.path.join(self.config.log_dir, 'runtime.json'), 'w'),
|
||||
sort_keys=True, indent=4, separators=(',', ': ')
|
||||
)
|
||||
if looped:
|
||||
break
|
||||
30
nav_src/data_utils.py
Normal file
30
nav_src/data_utils.py
Normal file
@ -0,0 +1,30 @@
|
||||
import os
|
||||
import json
|
||||
import numpy as np
|
||||
|
||||
def load_instr_datasets(anno_dir, dataset, splits):
|
||||
data = []
|
||||
for split in splits:
|
||||
filepath = os.path.join(anno_dir, f'{split}.json')
|
||||
with open(filepath) as f:
|
||||
new_data = json.load(f)
|
||||
|
||||
data += new_data
|
||||
|
||||
return data
|
||||
|
||||
def construct_instrs(anno_dir, dataset, splits):
|
||||
data = []
|
||||
if "instr" in splits[0]:
|
||||
return load_instr_datasets(anno_dir, dataset, splits)
|
||||
|
||||
for i, item in enumerate(load_instr_datasets(anno_dir, dataset, splits)):
|
||||
# Split multiple instructions into separate entries
|
||||
for j, instr in enumerate(item['instructions']):
|
||||
new_item = dict(item)
|
||||
new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
|
||||
new_item['instruction'] = instr
|
||||
del new_item['instructions']
|
||||
del new_item['instr_encodings']
|
||||
data.append(new_item)
|
||||
return data
|
||||
323
nav_src/env.py
Normal file
323
nav_src/env.py
Normal file
@ -0,0 +1,323 @@
|
||||
''' Batched REVERIE navigation environment '''
|
||||
|
||||
import json
|
||||
import os
|
||||
import numpy as np
|
||||
import random
|
||||
import networkx as nx
|
||||
from collections import defaultdict
|
||||
|
||||
from utils.data import load_nav_graphs
|
||||
from eval_utils import cal_dtw, cal_cls
|
||||
from utils.graph_utils import NavGraph
|
||||
|
||||
ERROR_MARGIN = 3.0
|
||||
|
||||
class Simulator(object):
|
||||
''' A simple simulator in Matterport3D environment '''
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
navigable_dir: str,):
|
||||
self.heading = 0
|
||||
self.elevation = 0
|
||||
self.scan_ID = ''
|
||||
self.viewpoint_ID = ''
|
||||
self.navigable_dir = navigable_dir
|
||||
self.navigable_dict = {}
|
||||
self.candidate = {}
|
||||
self.gmap = NavGraph()
|
||||
|
||||
def newEpisode(
|
||||
self,
|
||||
scan_ID: str,
|
||||
viewpoint_ID: str,
|
||||
heading: int,
|
||||
elevation: int,):
|
||||
self.heading = heading
|
||||
self.elevation = elevation
|
||||
self.scan_ID = scan_ID
|
||||
self.viewpoint_ID = viewpoint_ID
|
||||
# Load navigable dict
|
||||
navigable_path = os.path.join(self.navigable_dir, self.scan_ID + '_navigable.json')
|
||||
with open(navigable_path, 'r') as f:
|
||||
self.navigable_dict = json.load(f)
|
||||
# Get candidate
|
||||
self.getCandidate()
|
||||
|
||||
def updateGraph(self):
|
||||
# build graph
|
||||
for candidate in self.candidate.keys():
|
||||
self.gmap.update_connection(self.viewpoint_ID, candidate)
|
||||
|
||||
def getState(self) -> dict:
|
||||
self.state = {
|
||||
'scanID': self.scan_ID,
|
||||
'viewpointID': self.viewpoint_ID,
|
||||
'heading': self.heading,
|
||||
'elevation': self.elevation,
|
||||
'candidate': self.candidate,
|
||||
}
|
||||
return self.state
|
||||
|
||||
def getCandidate(self):
|
||||
"""
|
||||
Get the agent's candidate list from pre-stored navigable dict.
|
||||
"""
|
||||
self.candidate = self.navigable_dict[self.viewpoint_ID]
|
||||
self.updateGraph()
|
||||
|
||||
def makeAction(self, next_viewpoint_ID):
|
||||
"""
|
||||
Make action and update the agent's state.
|
||||
"""
|
||||
if next_viewpoint_ID == self.viewpoint_ID:
|
||||
return
|
||||
elif next_viewpoint_ID in self.candidate.keys():
|
||||
self.heading = self.candidate[next_viewpoint_ID]['heading']
|
||||
self.elevation = self.candidate[next_viewpoint_ID]['elevation']
|
||||
self.viewpoint_ID = next_viewpoint_ID
|
||||
self.getCandidate()
|
||||
|
||||
|
||||
class EnvBatch(object):
|
||||
''' A simple wrapper for a batch of MatterSim environments,
|
||||
using discretized viewpoints and pretrained features '''
|
||||
|
||||
def __init__(self, navigable_dir, feat_db=None, batch_size=100):
|
||||
"""
|
||||
1. Load pretrained image feature
|
||||
2. Init the Simulator.
|
||||
:param feat_db: The name of file stored the feature.
|
||||
:param batch_size: Used to create the simulator list.
|
||||
"""
|
||||
self.feat_db = feat_db
|
||||
|
||||
self.sims = []
|
||||
for i in range(batch_size):
|
||||
sim = Simulator(navigable_dir)
|
||||
self.sims.append(sim)
|
||||
|
||||
def _make_id(self, scanId, viewpointId):
|
||||
return scanId + '_' + viewpointId
|
||||
|
||||
def newEpisodes(self, scanIds, viewpointIds, headings):
|
||||
for i, (scanId, viewpointId, heading) in enumerate(zip(scanIds, viewpointIds, headings)):
|
||||
self.sims[i].newEpisode(scanId, viewpointId, heading, 0)
|
||||
|
||||
def getStates(self):
|
||||
"""
|
||||
Get list of states augmented with precomputed image features. rgb field will be empty.
|
||||
Agent's current view [0-35] (set only when viewing angles are discretized)
|
||||
[0-11] looking down, [12-23] looking at horizon, [24-35] looking up
|
||||
:return: [ ((36, 2048), sim_state) ] * batch_size
|
||||
"""
|
||||
feature_states = []
|
||||
for i, sim in enumerate(self.sims):
|
||||
state = sim.getState()
|
||||
|
||||
feature = self.feat_db.get_image_observation(state["scanID"], state["viewpointID"])
|
||||
feature_states.append((feature, state))
|
||||
return feature_states
|
||||
|
||||
def makeActions(self, next_viewpoint_IDs):
|
||||
''' Take an action using the full state dependent action interface (with batched input)'''
|
||||
for i, next_viewpoint_ID in enumerate(next_viewpoint_IDs):
|
||||
self.sims[i].makeAction(next_viewpoint_ID)
|
||||
|
||||
|
||||
class R2RNavBatch(object):
|
||||
''' Implements the REVERIE navigation task, using discretized viewpoints and pretrained features '''
|
||||
|
||||
def __init__(
|
||||
self, view_db, instr_data, connectivity_dir, navigable_dir,
|
||||
batch_size=1, seed=0, name=None,
|
||||
):
|
||||
self.env = EnvBatch(navigable_dir, feat_db=view_db, batch_size=batch_size)
|
||||
self.data = instr_data
|
||||
self.scans = set([x['scan'] for x in self.data])
|
||||
self.connectivity_dir = connectivity_dir
|
||||
self.batch_size = batch_size
|
||||
self.name = name
|
||||
|
||||
self.gt_trajs = self._get_gt_trajs(self.data) # for evaluation
|
||||
|
||||
# use different seeds in different processes to shuffle data
|
||||
self.seed = seed
|
||||
random.seed(self.seed)
|
||||
random.shuffle(self.data)
|
||||
|
||||
self.ix = 0
|
||||
self._load_nav_graphs()
|
||||
|
||||
self.buffered_state_dict = {}
|
||||
print('%s loaded with %d instructions, using splits: %s' % (
|
||||
self.__class__.__name__, len(self.data), self.name))
|
||||
|
||||
def _get_gt_trajs(self, data):
|
||||
gt_trajs = {
|
||||
x['instr_id']: (x['scan'], x['path']) \
|
||||
for x in data if len(x['path']) > 1
|
||||
}
|
||||
return gt_trajs
|
||||
|
||||
def size(self):
|
||||
return len(self.data)
|
||||
|
||||
def _load_nav_graphs(self):
|
||||
"""
|
||||
load graph from self.scan,
|
||||
Store the graph {scan_id: graph} in self.graphs
|
||||
Store the shortest path {scan_id: {view_id_x: {view_id_y: [path]} } } in self.paths
|
||||
Store the distances in self.distances. (Structure see above)
|
||||
Load connectivity graph for each scan, useful for reasoning about shortest paths
|
||||
:return: None
|
||||
"""
|
||||
print('Loading navigation graphs for %d scans' % len(self.scans))
|
||||
self.graphs = load_nav_graphs(self.connectivity_dir, self.scans)
|
||||
self.shortest_paths = {}
|
||||
for scan, G in self.graphs.items(): # compute all shortest paths
|
||||
self.shortest_paths[scan] = dict(nx.all_pairs_dijkstra_path(G))
|
||||
self.shortest_distances = {}
|
||||
for scan, G in self.graphs.items(): # compute all shortest paths
|
||||
self.shortest_distances[scan] = dict(nx.all_pairs_dijkstra_path_length(G))
|
||||
|
||||
def _next_minibatch(self, batch_size=None, **kwargs):
|
||||
"""
|
||||
Store the minibach in 'self.batch'
|
||||
"""
|
||||
if batch_size is None:
|
||||
batch_size = self.batch_size
|
||||
|
||||
batch = self.data[self.ix: self.ix+batch_size]
|
||||
if len(batch) < batch_size:
|
||||
random.shuffle(self.data)
|
||||
self.ix = batch_size - len(batch)
|
||||
batch += self.data[:self.ix]
|
||||
else:
|
||||
self.ix += batch_size
|
||||
self.batch = batch
|
||||
|
||||
def reset_epoch(self, shuffle=False):
|
||||
''' Reset the data index to beginning of epoch. Primarily for testing.
|
||||
You must still call reset() for a new episode. '''
|
||||
if shuffle:
|
||||
random.shuffle(self.data)
|
||||
self.ix = 0
|
||||
|
||||
def _get_obs(self):
|
||||
obs = []
|
||||
for i, (feature, state) in enumerate(self.env.getStates()):
|
||||
item = self.batch[i]
|
||||
|
||||
ob = {
|
||||
'obs' : feature["detail"],
|
||||
'obs_summary' : feature["summary"],
|
||||
'objects' : feature["objects"],
|
||||
'instr_id' : item['instr_id'],
|
||||
# 'action_plan' : item['action_plan'],
|
||||
'scan' : state['scanID'],
|
||||
'viewpoint' : state['viewpointID'],
|
||||
'heading' : state['heading'],
|
||||
'elevation' : state['elevation'],
|
||||
'candidate': state['candidate'],
|
||||
'instruction' : item['instruction'],
|
||||
'gt_path' : item['path'],
|
||||
'path_id' : item['path_id']
|
||||
}
|
||||
# RL reward. The negative distance between the state and the final state
|
||||
# There are multiple gt end viewpoints on REVERIE.
|
||||
if ob['instr_id'] in self.gt_trajs:
|
||||
ob['distance'] = self.shortest_distances[ob['scan']][ob['viewpoint']][item['path'][-1]]
|
||||
else:
|
||||
ob['distance'] = 0
|
||||
|
||||
obs.append(ob)
|
||||
return obs
|
||||
|
||||
def reset(self, **kwargs):
|
||||
''' Load a new minibatch / episodes. '''
|
||||
self._next_minibatch(**kwargs)
|
||||
|
||||
scanIds = [item['scan'] for item in self.batch]
|
||||
viewpointIds = [item['path'][0] for item in self.batch]
|
||||
headings = [item['heading'] for item in self.batch]
|
||||
self.env.newEpisodes(scanIds, viewpointIds, headings)
|
||||
return self._get_obs()
|
||||
|
||||
def step(self, next_viewpoint_IDs):
|
||||
''' Take action (same interface as makeActions) '''
|
||||
self.env.makeActions(next_viewpoint_IDs)
|
||||
return self._get_obs()
|
||||
|
||||
############### Nav Evaluation ###############
|
||||
def _get_nearest(self, shortest_distances, goal_id, path):
|
||||
near_id = path[0]
|
||||
near_d = shortest_distances[near_id][goal_id]
|
||||
for item in path:
|
||||
d = shortest_distances[item][goal_id]
|
||||
if d < near_d:
|
||||
near_id = item
|
||||
near_d = d
|
||||
return near_id
|
||||
|
||||
def _eval_item(self, scan, pred_path, gt_path):
|
||||
scores = {}
|
||||
|
||||
shortest_distances = self.shortest_distances[scan]
|
||||
|
||||
path = sum(pred_path, [])
|
||||
assert gt_path[0] == path[0], 'Result trajectories should include the start position'
|
||||
|
||||
nearest_position = self._get_nearest(shortest_distances, gt_path[-1], path)
|
||||
|
||||
scores['nav_error'] = shortest_distances[path[-1]][gt_path[-1]]
|
||||
scores['oracle_error'] = shortest_distances[nearest_position][gt_path[-1]]
|
||||
|
||||
scores['action_steps'] = len(pred_path) - 1
|
||||
scores['trajectory_steps'] = len(path) - 1
|
||||
scores['trajectory_lengths'] = np.sum([shortest_distances[a][b] for a, b in zip(path[:-1], path[1:])])
|
||||
|
||||
gt_lengths = np.sum([shortest_distances[a][b] for a, b in zip(gt_path[:-1], gt_path[1:])])
|
||||
|
||||
scores['success'] = float(scores['nav_error'] < ERROR_MARGIN)
|
||||
scores['spl'] = scores['success'] * gt_lengths / max(scores['trajectory_lengths'], gt_lengths, 0.01)
|
||||
scores['oracle_success'] = float(scores['oracle_error'] < ERROR_MARGIN)
|
||||
|
||||
scores.update(
|
||||
cal_dtw(shortest_distances, path, gt_path, scores['success'], ERROR_MARGIN)
|
||||
)
|
||||
scores['CLS'] = cal_cls(shortest_distances, path, gt_path, ERROR_MARGIN)
|
||||
|
||||
return scores
|
||||
|
||||
def eval_metrics(self, preds):
|
||||
''' Evaluate each agent trajectory based on how close it got to the goal location
|
||||
the path contains [view_id, angle, vofv]'''
|
||||
print('eval %d predictions' % (len(preds)))
|
||||
|
||||
metrics = defaultdict(list)
|
||||
for item in preds:
|
||||
instr_id = item['instr_id']
|
||||
traj = item['trajectory']
|
||||
scan, gt_traj = self.gt_trajs[instr_id]
|
||||
traj_scores = self._eval_item(scan, traj, gt_traj)
|
||||
for k, v in traj_scores.items():
|
||||
metrics[k].append(v)
|
||||
metrics['instr_id'].append(instr_id)
|
||||
|
||||
avg_metrics = {
|
||||
'action_steps': np.mean(metrics['action_steps']),
|
||||
'steps': np.mean(metrics['trajectory_steps']),
|
||||
'lengths': np.mean(metrics['trajectory_lengths']),
|
||||
'nav_error': np.mean(metrics['nav_error']),
|
||||
'oracle_error': np.mean(metrics['oracle_error']),
|
||||
'sr': np.mean(metrics['success']) * 100,
|
||||
'oracle_sr': np.mean(metrics['oracle_success']) * 100,
|
||||
'spl': np.mean(metrics['spl']) * 100,
|
||||
'nDTW': np.mean(metrics['nDTW']) * 100,
|
||||
'SDTW': np.mean(metrics['SDTW']) * 100,
|
||||
'CLS': np.mean(metrics['CLS']) * 100,
|
||||
}
|
||||
return avg_metrics, metrics
|
||||
|
||||
43
nav_src/eval_utils.py
Normal file
43
nav_src/eval_utils.py
Normal file
@ -0,0 +1,43 @@
|
||||
''' Utils for evaluation '''
|
||||
|
||||
import numpy as np
|
||||
|
||||
|
||||
def cal_dtw(shortest_distances, prediction, reference, success=None, threshold=3.0):
|
||||
dtw_matrix = np.inf * np.ones((len(prediction) + 1, len(reference) + 1))
|
||||
dtw_matrix[0][0] = 0
|
||||
for i in range(1, len(prediction)+1):
|
||||
for j in range(1, len(reference)+1):
|
||||
best_previous_cost = min(
|
||||
dtw_matrix[i-1][j], dtw_matrix[i][j-1], dtw_matrix[i-1][j-1])
|
||||
cost = shortest_distances[prediction[i-1]][reference[j-1]]
|
||||
dtw_matrix[i][j] = cost + best_previous_cost
|
||||
|
||||
dtw = dtw_matrix[len(prediction)][len(reference)]
|
||||
ndtw = np.exp(-dtw/(threshold * len(reference)))
|
||||
if success is None:
|
||||
success = float(shortest_distances[prediction[-1]][reference[-1]] < threshold)
|
||||
sdtw = success * ndtw
|
||||
|
||||
return {
|
||||
'DTW': dtw,
|
||||
'nDTW': ndtw,
|
||||
'SDTW': sdtw
|
||||
}
|
||||
|
||||
def cal_cls(shortest_distances, prediction, reference, threshold=3.0):
|
||||
def length(nodes):
|
||||
return np.sum([
|
||||
shortest_distances[a][b]
|
||||
for a, b in zip(nodes[:-1], nodes[1:])
|
||||
])
|
||||
|
||||
coverage = np.mean([
|
||||
np.exp(-np.min([ # pylint: disable=g-complex-comprehension
|
||||
shortest_distances[u][v] for v in prediction
|
||||
]) / threshold) for u in reference
|
||||
])
|
||||
expected = coverage * length(reference)
|
||||
score = expected / (expected + np.abs(expected - length(prediction)))
|
||||
return coverage * score
|
||||
|
||||
79
nav_src/parser.py
Normal file
79
nav_src/parser.py
Normal file
@ -0,0 +1,79 @@
|
||||
import argparse
|
||||
import os
|
||||
|
||||
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(description="")
|
||||
|
||||
# datasets
|
||||
parser.add_argument('--root_dir', type=str, default='../datasets')
|
||||
parser.add_argument('--dataset', type=str, default='r2r', choices=['r2r', 'r4r'])
|
||||
parser.add_argument('--output_dir', type=str, default='../datasets/R2R/exprs/gpt-3.5-turbo', help='experiment id')
|
||||
# parser.add_argument('--output_dir', type=str, default='../datasets/R2R/exprs/LlaMA-2-13b-test', help='experiment id')
|
||||
parser.add_argument('--seed', type=int, default=0)
|
||||
|
||||
# Agent
|
||||
parser.add_argument('--temperature', type=float, default=0.0, help='temperature for llm')
|
||||
parser.add_argument('--llm_model_name', type=str, default='gpt-3.5-turbo', help='llm model name')
|
||||
# parser.add_argument('--llm_model_name', type=str, default='gpt-4', help='llm model name')
|
||||
# parser.add_argument('--llm_model_name', type=str, default='LlaMA-2-13b', help='llm model name')
|
||||
parser.add_argument('--batch_size', type=int, default=1)
|
||||
parser.add_argument('--max_iterations', type=int, default=10)
|
||||
|
||||
# General config
|
||||
parser.add_argument('--iters', type=int, default=10, help='number of iterations to run')
|
||||
# parser.add_argument('--iters', type=int, default=None, help='number of iterations to run')
|
||||
parser.add_argument('--max_scratchpad_length', type=int, default=1000, help='max number of steps in an episode')
|
||||
parser.add_argument('--test', action='store_true', default=False)
|
||||
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_0')
|
||||
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_1')
|
||||
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_2')
|
||||
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_3')
|
||||
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_4')
|
||||
parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr')
|
||||
|
||||
parser.add_argument('--load_instruction', action='store_true', default=True)
|
||||
parser.add_argument('--load_action_plan', action='store_true', default=True)
|
||||
|
||||
parser.add_argument('--use_relative_angle', action='store_true', default=True)
|
||||
parser.add_argument('--use_history_chain', action='store_true', default=False)
|
||||
parser.add_argument('--use_tool_chain', action='store_true', default=False)
|
||||
parser.add_argument('--use_navigable', action='store_true', default=False)
|
||||
parser.add_argument('--use_single_action', action='store_true', default=True)
|
||||
|
||||
parser.add_argument('--detailed_output', action='store_true', default=True)
|
||||
|
||||
# parser.add_argument('--valid_file', type=str, default='../datasets/R2R/exprs/4-R2R_val_unseen_instr/4-R2R_val_unseen_instr.json', help='valid file name')
|
||||
parser.add_argument('--valid_file', type=str, default=None, help='valid file name')
|
||||
|
||||
args, _ = parser.parse_known_args()
|
||||
|
||||
args = postprocess_args(args)
|
||||
|
||||
return args
|
||||
|
||||
|
||||
def postprocess_args(args):
|
||||
ROOTDIR = args.root_dir
|
||||
|
||||
# Setup input paths
|
||||
args.obs_dir = os.path.join(ROOTDIR, 'R2R', 'observations_list_summarized')
|
||||
args.obs_summary_dir = os.path.join(ROOTDIR, 'R2R', 'observations_summarized')
|
||||
args.obj_dir = os.path.join(ROOTDIR, 'R2R', 'objects_list')
|
||||
|
||||
args.connectivity_dir = os.path.join(ROOTDIR, 'R2R', 'connectivity')
|
||||
args.scan_data_dir = os.path.join(ROOTDIR, 'Matterport3D', 'v1_unzip_scans')
|
||||
|
||||
args.anno_dir = os.path.join(ROOTDIR, 'R2R', 'annotations')
|
||||
args.navigable_dir = os.path.join(ROOTDIR, 'R2R', 'navigable')
|
||||
|
||||
# Build paths
|
||||
args.log_dir = os.path.join(args.output_dir, 'logs')
|
||||
args.pred_dir = os.path.join(args.output_dir, 'preds')
|
||||
|
||||
os.makedirs(args.output_dir, exist_ok=True)
|
||||
os.makedirs(args.log_dir, exist_ok=True)
|
||||
os.makedirs(args.pred_dir, exist_ok=True)
|
||||
|
||||
return args
|
||||
|
||||
0
nav_src/prompt/__init__.py
Normal file
0
nav_src/prompt/__init__.py
Normal file
280
nav_src/prompt/planner_prompt.py
Normal file
280
nav_src/prompt/planner_prompt.py
Normal file
@ -0,0 +1,280 @@
|
||||
# flake8: noqa
|
||||
|
||||
from langchain.prompts.prompt import PromptTemplate
|
||||
|
||||
PLANNER_PROMPT = """Given the long instruction: {instruction}
|
||||
|
||||
Divide the long instruction into action steps with detailed descriptions in the following format:
|
||||
Action plan:
|
||||
1. action_step_1
|
||||
2. action_step_2
|
||||
...
|
||||
|
||||
Action plan:"""
|
||||
|
||||
ACTION_PROMPT = """You are an agent following an action plan to navigation in indoor environment.
|
||||
|
||||
Action plan: {action_plan}
|
||||
|
||||
You are currently at one of the steps in the plan. You will be given the history of previous steps you have taken, the current observation of the environment, and the navigable viewpoints for the next step.
|
||||
|
||||
You should:
|
||||
1) evaluate the history and observation to decide which step of action plan you are at.
|
||||
2) choose one viewpoint from the navigable viewpoints.
|
||||
|
||||
Each navigable viewpoint has a unique ID, you should only answer the ID in the Final Answer.
|
||||
|
||||
----
|
||||
Starting below, you should strictly follow this format:
|
||||
|
||||
History: the history of previous steps you have taken
|
||||
Observation: the current observation of the environment
|
||||
Navigable viewpoints: the navigable viewpoints for the next step
|
||||
Thought: your thought on the next step
|
||||
Final Answer: 'viepointID'
|
||||
----
|
||||
|
||||
Begin!
|
||||
|
||||
History: {history}
|
||||
Observation: {observation}
|
||||
Navigable viewpoints: {navigable_viewpoints}
|
||||
Thought:"""
|
||||
|
||||
HISTORY_PROMPT = """You are an agent navigating in indoor environment.
|
||||
|
||||
You have reached a new viewpoint after taking previous action. You will be given the navigation history, the current observation of the environment, and the previous action you taken.
|
||||
|
||||
You should:
|
||||
1) evaluate the new observation and history.
|
||||
2) update the history with the previous action and the new observation.
|
||||
|
||||
History: {history}
|
||||
Previous action: {previous_action}
|
||||
Observation: {observation}
|
||||
Update history with the new observation:"""
|
||||
|
||||
MAKE_ACTION_TOOL_NAME = "action_maker"
|
||||
MAKE_ACTION_TOOL_DESCRIPTION = f'Can be used to move to next adjacent viewpoint.\nThe input to this tool should be a viewpoint ID string of the next viewpoint you wish to visit. For example:\nAction: action_maker\nAction Input: "4a153b13a3f6424784cb8e5dabbb3a2c".'
|
||||
|
||||
BACK_TRACE_PROMPT = """You are an agent following an action plan to navigation in indoor environment.
|
||||
|
||||
You are currently at an intermediate step of the trajectory but seems going off the track. You will be given the action plan describing the whole trajectory, the history of previous steps you have taken, the observations of the viewpoints along the trajectory.
|
||||
|
||||
You should evaluate the history, the action plan and the observations along the way to decide the viewpoints to go back to.
|
||||
|
||||
Each navigable viewpoint has a unique ID, you should only answer the ID in the Final Answer.
|
||||
You must choose one from the navigable viewpoints, DO NOT answer None of the above.
|
||||
|
||||
----
|
||||
Starting below, you should follow this format:
|
||||
|
||||
Action plan: the action plan describing the whole trajectory
|
||||
History: the history of previous steps you have taken
|
||||
Observation: the observations of each viewpoint along the trajectory
|
||||
Thought: your thought about the next step
|
||||
Final Answer: 'viewpointID'
|
||||
----
|
||||
|
||||
Begin!
|
||||
|
||||
Action plan: {action_plan}
|
||||
History: {history}
|
||||
Observation: {observation}
|
||||
Thought:"""
|
||||
|
||||
BACK_TRACE_TOOL_NAME = "back_tracer"
|
||||
BACK_TRACE_TOOL_DESCRIPTION = f"Can be used to move to any previous viewpoint on the trajectory even if the viewpoint is not adjacent.\nCan be call like {BACK_TRACE_TOOL_NAME}('viewpointID'), where 'viewpointID' is the ID of any previous viewpoint.\nThe input to this tool should be a string of viewpoint ID ONLY."
|
||||
|
||||
|
||||
VLN_ORCHESTRATOR_TOOL_PROMPT = """You are an agent that follows an instruction to navigate in indoor environment. You are required to make sequential decisions according to the observation of the environment to follow the given instruction.
|
||||
At the beginning of the navigation, you will be given the instruction describing the whole trajectory.
|
||||
During navigation, you will receive the history of previous steps you have taken, the current observation of the environment at each step.
|
||||
|
||||
To navigate in unseen environment is hard, it is possible to go off the track as the description of the instruction.
|
||||
You should act as a high level controlor, at each step, you should consider whether you are on the right track or not.
|
||||
If yes, use the action_maker tool to continue.
|
||||
If not, use the back_tracer tool to move to previous viewpoint on the trajectory.
|
||||
|
||||
Here are the descriptions of these tools: {tool_descriptions}
|
||||
|
||||
----
|
||||
Starting below, you should follow this format:
|
||||
|
||||
Instruction: the instruction describing the whole trajectory
|
||||
Initial Observation: the initial observation of the environment
|
||||
Thought: I should start navigation according to the instruction
|
||||
Action: action_maker
|
||||
Action Input: ""
|
||||
Observation: the result of the action
|
||||
Thought: you should always think about what to do next
|
||||
Action: the action to take, should be one of the tools [{tool_names}]
|
||||
Action Input: ""
|
||||
Observation: the result of the action
|
||||
... (this Thought/Action/Action Input/Observation can repeat N times)
|
||||
Thought: I am finished executing the instruction.
|
||||
Final Answer: Finished!
|
||||
|
||||
Begin!
|
||||
|
||||
Instruction: {action_plan}
|
||||
Initial Observation: {init_observation}
|
||||
Thought: I should start navigation according to the instruction
|
||||
Action: action_maker
|
||||
Action Input: ""
|
||||
Observation: {observation}
|
||||
Thought:{agent_scratchpad}"""
|
||||
|
||||
VLN_ORCHESTRATOR_ABS_PROMPT = """You are an agent that follows an instruction to navigate in indoor environment. You are required to make sequential decisions according to the observation of the environment to follow the given instruction.
|
||||
At the beginning of the navigation, you will be given the instruction describing the whole trajectory.
|
||||
During navigation, you will receive the history of previous steps you have taken, your current orientation, the current observation of the environment at each step, and the navigable viewpoints' orientations from current viewpoint.
|
||||
All orientation are normalized in world cooridinate in degrees, you should always consider the relative angle between the observation and navigable viewpoints. i.e. relative angle 0 and 360 are the front, 90 and -270 are the right, 180 and -180 are the back, 270 and -90 are the left.
|
||||
|
||||
To navigate in unseen environment is hard, it is possible to go off the track as the description of the instruction. You are allow to back trace but you are encouraged to explore the environment as much as possible. The ultimate goal is to reach the destination in the instruction.
|
||||
At each step, you should consider:
|
||||
(1) According to Current Viewpoint observation and History, have you reached the destination?
|
||||
If yes you should stop, output the 'Final Answer: Finished!' to stop.
|
||||
If no you should continue:
|
||||
(2) Consider whether you are on the right track or not.
|
||||
If yes, use the action_maker tool to move to adjacent viewpoint shown in Navigable Viewpoints.
|
||||
If not, use the back_tracer tool to move to any previous viewpoint shown in History.
|
||||
You should always use the action_maker at the begining of navigation. If you are told to wait in the instruction you should output 'Final Answer: Finished!' to stop.
|
||||
|
||||
Here are the descriptions of these tools: {tool_descriptions}
|
||||
|
||||
The viewpoint ID is a string of 12 characters, for example '4a153b13a3f6424784cb8e5dabbb3a2c'. You are very strict to the viewpoint ID and will never fabricate nonexistent IDs.
|
||||
|
||||
----
|
||||
Starting below, you should follow this format:
|
||||
|
||||
Instruction: the instruction describing the whole trajectory
|
||||
Initial Observation: the initial observation of the environment
|
||||
Thought: you should always think about what to do next
|
||||
Action: the action to take, must be one of the tools [{tool_names}]
|
||||
Action Input: "Viewpoint ID"
|
||||
Observation: the result of the action
|
||||
... (this Thought/Action/Action Input/Observation can repeat N times)
|
||||
Thought: I have reached the destination, I can stop.
|
||||
Final Answer: Finished!
|
||||
----
|
||||
|
||||
Begin!
|
||||
|
||||
Instruction: {action_plan}
|
||||
Initial Observation: {init_observation}
|
||||
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""
|
||||
|
||||
VLN_ORCHESTRATOR_PROMPT = """You are an agent that follows an instruction to navigate in indoor environment. You are required to make sequential decisions according to the observation of the environment to follow the given instruction.
|
||||
At the beginning of the navigation, you will be given the instruction describing the whole trajectory.
|
||||
During navigation, you will receive the history of previous steps you have taken, the current observation of the environment, and the navigable viewpoints' orientations from current viewpoint.
|
||||
All orientation are in degrees from -180 to 180, i.e. angle 0 is the front, right 90 is 90 degree at the right, right 180 and left 180 are the back, left 90 is 90 degree at the left.
|
||||
|
||||
To navigate in unseen environment is hard, it is possible to go off the track as the description of the instruction. You are allow to back trace but you are encouraged to explore the environment as much as possible. The ultimate goal is to reach the destination in the instruction.
|
||||
At each step, you should consider:
|
||||
(1) According to Current Viewpoint observation and History, have you reached the destination?
|
||||
If yes you should stop, output the 'Final Answer: Finished!' to stop.
|
||||
If no you should continue:
|
||||
(2) Consider whether you are on the right track or not.
|
||||
If yes, use the action_maker tool to move to adjacent viewpoint shown in Navigable Viewpoints.
|
||||
If not, use the back_tracer tool to move to any previous viewpoint shown in History.
|
||||
You should always use the action_maker at the begining of navigation. Show your reasoning in the Thought section.
|
||||
|
||||
Here are the descriptions of these tools: {tool_descriptions}
|
||||
|
||||
The viewpoint ID is a string of 12 characters, for example '4a153b13a3f6424784cb8e5dabbb3a2c'. You are very strict to the viewpoint ID and will never fabricate nonexistent IDs.
|
||||
|
||||
----
|
||||
Starting below, you should follow this format:
|
||||
|
||||
Instruction: the instruction describing the whole trajectory
|
||||
Initial Observation: the initial observation of the environment
|
||||
Thought: you should always think about what to do next and why
|
||||
Action: the action to take, must be one of the tools [{tool_names}]
|
||||
Action Input: "Viewpoint ID"
|
||||
Observation: the result of the action
|
||||
... (this Thought/Action/Action Input/Observation can repeat N times)
|
||||
Thought: I have reached the destination, I can stop.
|
||||
Final Answer: Finished!
|
||||
----
|
||||
|
||||
Begin!
|
||||
|
||||
Instruction: {action_plan}
|
||||
Initial Observation: {init_observation}
|
||||
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""
|
||||
|
||||
VLN_GPT4_PROMPT = """You are an intelligent embodied agent that follows an instruction to navigate in an indoor environment. Your task is to move among the static viewpoints (positions) of a pre-defined graph of the environment, and try to reach the target viewpoint as described by the given instruction with the least steps.
|
||||
|
||||
At the beginning of the navigation, you will be given an instruction of a trajectory which describes all observations and the action you should take at each step.
|
||||
During navigation, at each step, you will be at a specific viewpoint and receive the history of previous steps you have taken (containing your "Thought", "Action", "Action Input" and "Observation" after the "Begin!" sign) and the observation of current viewpoint (including scene descriptions, objects, and navigable directions/distances within 3 meters).
|
||||
Orientations range from -180 to 180 degrees: "0" signifies forward, "right 90" rightward, "right (or left) 180" backward, and "left 90" leftward.
|
||||
|
||||
You make actions by selecting navigable viewpoints to reach the destination. You are encouraged to explore the environment while avoiding revisiting viewpoints by comparing current navigable and previously visited IDs in previous "Action Input". The ultimate goal is to stop within 3 meters of the destination in the instruction. If destination visible but the target object is not detected within 3 meters, move closer.
|
||||
At each step, you should consider:
|
||||
(1) According to Current Viewpoint observation and History, have you reached the destination?
|
||||
If yes you should stop, output the 'Final Answer: Finished!' to stop.
|
||||
If not you should continue:
|
||||
(2) Consider where you are on the trajectory and what should be the next viewpoint to navigate according to the instruction.
|
||||
use the action_maker tool, input the next navigable viewpoint ID to move to that location.
|
||||
|
||||
Show your reasoning in the Thought section.
|
||||
|
||||
Here are the descriptions of these tools:
|
||||
{tool_descriptions}
|
||||
|
||||
Every viewpoint has a unique viewpoint ID. You are very strict to the viewpoint ID and will never fabricate nonexistent IDs.
|
||||
|
||||
----
|
||||
Starting below, you should follow this format:
|
||||
|
||||
Instruction: an instruction of a trajectory which describes all observations and the actions should be taken
|
||||
Initial Observation: the initial observation of the environment
|
||||
Thought: you should always think about what to do next and why
|
||||
Action: the action to take, must be one of the tools [{tool_names}]
|
||||
Action Input: "Viewpoint ID"
|
||||
Observation: the result of the action
|
||||
... (this Thought/Action/Action Input/Observation can repeat N times)
|
||||
Thought: I have reached the destination, I can stop.
|
||||
Final Answer: Finished!
|
||||
----
|
||||
|
||||
Begin!
|
||||
|
||||
Instruction: {action_plan}
|
||||
Initial Observation: {init_observation}
|
||||
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""
|
||||
|
||||
VLN_GPT35_PROMPT = """As an intelligent embodied agent, you will navigate an indoor environment to reach a target viewpoint based on a given instruction, performing the Vision and Language Navigation (VLN) task. You'll move among static positions within a pre-defined graph, aiming for minimal steps.
|
||||
|
||||
You will receive a trajectory instruction at the start and will have access to step history (your Thought, Action, Action Input and Obeservation after the Begin! sign) and current viewpoint observation (including scene descriptions, objects, and navigable directions/distances within 3 meters) during navigation. Orientations range from -180 to 180 degrees, with 0 being forward, right 90 rightward, right/left 180 backward, and left 90 leftward.
|
||||
|
||||
Explore the environment while avoiding revisiting viewpoints by comparing current and previously visited IDs. Reach within 3 meters of the instructed destination, and if it's visible but no objects are detected, move closer.
|
||||
|
||||
At each step, determine if you've reached the destination.
|
||||
If yes, stop and output 'Final Answer: Finished!'.
|
||||
If not, continue by considering your location and the next viewpoint based on the instruction, using the action_maker tool.
|
||||
Show your reasoning in the Thought section.
|
||||
|
||||
Follow the given format and use provided tools.
|
||||
{tool_descriptions}
|
||||
Do not fabricate nonexistent viewpoint IDs.
|
||||
|
||||
----
|
||||
Starting below, you should follow this format:
|
||||
|
||||
Instruction: the instruction describing the whole trajectory
|
||||
Initial Observation: the initial observation of the environment
|
||||
Thought: you should always think about what to do next and why
|
||||
Action: the action to take, must be one of the tools [{tool_names}]
|
||||
Action Input: "Viewpoint ID"
|
||||
Observation: the result of the action
|
||||
... (this Thought/Action/Action Input/Observation can repeat N times)
|
||||
Thought: I have reached the destination, I can stop.
|
||||
Final Answer: Finished!
|
||||
----
|
||||
|
||||
Begin!
|
||||
|
||||
Instruction: {action_plan}
|
||||
Initial Observation: {init_observation}
|
||||
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""
|
||||
37
nav_src/scripts/action_planner.py
Normal file
37
nav_src/scripts/action_planner.py
Normal file
@ -0,0 +1,37 @@
|
||||
import json
|
||||
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.llms.openai import OpenAI
|
||||
from langchain.prompts import PromptTemplate
|
||||
|
||||
from prompt.planner_prompt import (
|
||||
PLANNER_PROMPT,
|
||||
)
|
||||
|
||||
from data_utils import construct_instrs
|
||||
|
||||
# Using OpenAI davinci-text-003
|
||||
llm = OpenAI(temperature=0.0)
|
||||
|
||||
plan_prompt = PromptTemplate(
|
||||
template=PLANNER_PROMPT,
|
||||
input_variables=["instruction"],
|
||||
)
|
||||
|
||||
plan_chain = LLMChain(llm=llm, prompt=plan_prompt)
|
||||
|
||||
|
||||
splits = ['val_72']
|
||||
anno_dir = '../datasets/R2R/annotations'
|
||||
dataset = 'R2R'
|
||||
data = construct_instrs(anno_dir, dataset, splits)
|
||||
|
||||
for i, sample in enumerate(data):
|
||||
print(f"Sample {i}:")
|
||||
print(sample['instruction'])
|
||||
action_plan = plan_chain.run(sample['instruction'])
|
||||
print(action_plan)
|
||||
data[i]['action_plan'] = action_plan
|
||||
|
||||
with open('../datasets/R2R/annotations/R2R_val_72_action_plan.json', 'w') as f:
|
||||
json.dump(data, f, indent=2)
|
||||
34
nav_src/scripts/merge_preds.py
Normal file
34
nav_src/scripts/merge_preds.py
Normal file
@ -0,0 +1,34 @@
|
||||
import os
|
||||
import glob
|
||||
import json
|
||||
|
||||
def merge_json_files(base_dir):
|
||||
merged_data = []
|
||||
|
||||
# Iterate through subdirectories
|
||||
for subdir in os.listdir(base_dir):
|
||||
subdir_path = os.path.join(base_dir, subdir)
|
||||
|
||||
# Check if the path is a directory
|
||||
if os.path.isdir(subdir_path):
|
||||
# Find all JSON files in the 'preds' subdirectory
|
||||
json_files = glob.glob(os.path.join(subdir_path, "preds", "*.json"))
|
||||
|
||||
# Merge JSON data
|
||||
for file_path in json_files:
|
||||
with open(file_path, 'r') as json_file:
|
||||
data = json.load(json_file)
|
||||
|
||||
# Merge the data from this file into the merged_data dictionary
|
||||
for sample in data:
|
||||
merged_data.append(sample)
|
||||
|
||||
|
||||
# Save the merged JSON data to a file
|
||||
with open(os.path.join(base_dir, f"{exp_name}.json"), "w") as output_file:
|
||||
json.dump(merged_data, output_file, indent=4)
|
||||
|
||||
base_dir = "../datasets/R2R/exprs/"
|
||||
exp_name = "4-R2R_val_unseen_instr"
|
||||
path = os.path.join(base_dir, exp_name)
|
||||
merge_json_files(path)
|
||||
75
nav_src/scripts/obs_summarizer.py
Normal file
75
nav_src/scripts/obs_summarizer.py
Normal file
@ -0,0 +1,75 @@
|
||||
'''
|
||||
Use LLM chain to summarize the observations
|
||||
'''
|
||||
import os
|
||||
import json
|
||||
import asyncio
|
||||
import argparse
|
||||
|
||||
from langchain.chains.llm import LLMChain
|
||||
from langchain.llms.openai import OpenAI
|
||||
from langchain.prompts import PromptTemplate
|
||||
|
||||
async def async_generate(chain, viewpointID, ob_list):
|
||||
print(f"Summarizing {viewpointID} ...")
|
||||
tasks = [chain.arun(description=ob) for ob in ob_list]
|
||||
resp_list = await asyncio.gather(*tasks)
|
||||
print(f"Summarized {viewpointID}'s observations: {resp_list}\n")
|
||||
return resp_list
|
||||
|
||||
|
||||
async def generate_concurrently(chain, obs):
|
||||
tasks = [async_generate(chain, viewpointID, ob) for viewpointID, ob in obs.items()]
|
||||
results = await asyncio.gather(*tasks)
|
||||
return results
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--batch_size", type=int, default=5)
|
||||
parser.add_argument("--obs_dir", type=str, default="../datasets/R2R/observations_list/")
|
||||
parser.add_argument("--output_dir", type=str, default="../datasets/R2R/observations_list_summarized/")
|
||||
parser.add_argument("--sum_type", type=str, default="list", choices=["list", "single"])
|
||||
args = parser.parse_args()
|
||||
|
||||
obs_dir = args.obs_dir
|
||||
obs_files = os.listdir(obs_dir)
|
||||
output_dir = args.output_dir
|
||||
# make sure the output directory exists
|
||||
os.makedirs(output_dir, exist_ok=True)
|
||||
|
||||
llm = OpenAI(
|
||||
temperature=0.0,
|
||||
model_name="gpt-3.5-turbo",
|
||||
)
|
||||
|
||||
if args.sum_type == "single":
|
||||
summarize_prompt = PromptTemplate(
|
||||
template='Given the description of a viewpoint. Summarize the scene from the viewpoint in one concise sentence.\n\nDescription:\n{description}\n\nSummarization: The scene from the viewpoint is a',
|
||||
input_variables=["description"],
|
||||
)
|
||||
elif args.sum_type == "list":
|
||||
summarize_prompt = PromptTemplate(
|
||||
template='Here is a single scene view from top, down and middle:\n{description}\nSummarize the scene in one sentence:',
|
||||
input_variables=["description"],
|
||||
)
|
||||
|
||||
summarize_chain = LLMChain(llm=llm, prompt=summarize_prompt)
|
||||
|
||||
for obs_file in obs_files:
|
||||
obs_path = os.path.join(obs_dir, obs_file)
|
||||
with open(obs_path) as f:
|
||||
obs = json.load(f)
|
||||
summary = {}
|
||||
viewpointIDs = list(obs.keys())
|
||||
# Get the viewpointIDs in batches
|
||||
for i in range(0, len(viewpointIDs), args.batch_size):
|
||||
batch = viewpointIDs[i:i+args.batch_size]
|
||||
print(f"Summarizing scan {obs_file.split('.')[0]} batch [{i//args.batch_size}/{len(viewpointIDs)//args.batch_size}]")
|
||||
batch_obs = {viewpointID:obs[viewpointID] for viewpointID in batch}
|
||||
summarized_obs = asyncio.run(generate_concurrently(summarize_chain, batch_obs))
|
||||
summarized_obs = {viewpointID: summarized_obs[i] for i, viewpointID in enumerate(batch)}
|
||||
summary.update(summarized_obs)
|
||||
output_path = os.path.join(output_dir, f'{obs_file}.json')
|
||||
with open(output_path, 'w') as f:
|
||||
json.dump(summary, f, indent=2)
|
||||
36
nav_src/scripts/test_parse.py
Normal file
36
nav_src/scripts/test_parse.py
Normal file
@ -0,0 +1,36 @@
|
||||
import re
|
||||
import unittest
|
||||
|
||||
def extract_action_and_tool_input(text):
|
||||
regex = r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*\"?([a-fA-F0-9]{32})\"?"
|
||||
|
||||
action_match = re.search(regex, text, re.DOTALL)
|
||||
if action_match:
|
||||
action = action_match.group(1).strip()
|
||||
tool_input = action_match.group(2).strip()
|
||||
return action, tool_input
|
||||
else:
|
||||
return None, None
|
||||
|
||||
class TestActionAndToolInputExtraction(unittest.TestCase):
|
||||
|
||||
def test_extraction(self):
|
||||
samples = [
|
||||
("which tells me ... Action: action_maker\nAction Input: \"f237319a500640d8ac172db225a3ce9c\" (Left viewpoint ID)", "action_maker", "f237319a500640d8ac172db225a3ce9c"),
|
||||
("which is to turn right ... Action: action_maker\nAction Input: \"06bd0a2d004b454b9e93ddcf08344732\"", "action_maker", "06bd0a2d004b454b9e93ddcf08344732"),
|
||||
("which is to exit out ... Action: action_maker\nAction Input: \"424bcb744623413f830ece5c68319d70\"\n", "action_maker", "424bcb744623413f830ece5c68319d70")
|
||||
]
|
||||
|
||||
for idx, (sample, expected_action, expected_tool_input) in enumerate(samples, 1):
|
||||
action, tool_input = extract_action_and_tool_input(sample)
|
||||
|
||||
# Print statements
|
||||
print(f"Testing Sample {idx} ...")
|
||||
print(f"Expected Action: {expected_action}, Output: {action}")
|
||||
print(f"Expected Tool Input: {expected_tool_input}, Output: {tool_input}\n")
|
||||
|
||||
self.assertEqual(action, expected_action)
|
||||
self.assertEqual(tool_input, expected_tool_input)
|
||||
|
||||
if __name__ == '__main__':
|
||||
unittest.main()
|
||||
128
nav_src/utils/data.py
Normal file
128
nav_src/utils/data.py
Normal file
@ -0,0 +1,128 @@
|
||||
import os
|
||||
import json
|
||||
import networkx as nx
|
||||
import math
|
||||
import numpy as np
|
||||
|
||||
# class ImageFeaturesDB(object):
|
||||
# def __init__(self, img_ft_file, image_feat_size):
|
||||
# self.image_feat_size = image_feat_size
|
||||
# self.img_ft_file = img_ft_file
|
||||
# self._feature_store = {}
|
||||
|
||||
# def get_image_feature(self, scan, viewpoint):
|
||||
# key = '%s_%s' % (scan, viewpoint)
|
||||
# if key in self._feature_store:
|
||||
# ft = self._feature_store[key]
|
||||
# else:
|
||||
# with h5py.File(self.img_ft_file, 'r') as f:
|
||||
# ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
|
||||
# self._feature_store[key] = ft
|
||||
# return ft
|
||||
|
||||
class ImageObservationsDB(object):
|
||||
def __init__(self, img_obs_dir, img_obs_sum_dir, img_obj_dir):
|
||||
self.img_obs_dir = img_obs_dir
|
||||
self.img_obs_sum_dir = img_obs_sum_dir
|
||||
self.img_obj_dir = img_obj_dir
|
||||
self._obs_store = {}
|
||||
|
||||
def get_image_observation(self, scan, viewpoint):
|
||||
key = '%s_%s' % (scan, viewpoint)
|
||||
if key in self._obs_store:
|
||||
obs = self._obs_store[key]
|
||||
else:
|
||||
# Load image observation
|
||||
with open(os.path.join(self.img_obs_dir, f'{scan}.json'), 'r') as f:
|
||||
obs = json.load(f)[viewpoint]
|
||||
self._obs_store[key] = {}
|
||||
self._obs_store[key]['detail'] = obs
|
||||
# Load image observation summary for history
|
||||
with open(os.path.join(self.img_obs_sum_dir, f'{scan}_summarized.json'), 'r') as f:
|
||||
obs_sum = json.load(f)[viewpoint]
|
||||
self._obs_store[key]['summary'] = obs_sum
|
||||
# Load image objects
|
||||
with open(os.path.join(self.img_obj_dir, f'{scan}.json'), 'r') as f:
|
||||
obj = json.load(f)[viewpoint]
|
||||
self._obs_store[key]['objects'] = obj
|
||||
obs = self._obs_store[key]
|
||||
return obs
|
||||
|
||||
def load_nav_graphs(connectivity_dir, scans):
|
||||
''' Load connectivity graph for each scan '''
|
||||
|
||||
def distance(pose1, pose2):
|
||||
''' Euclidean distance between two graph poses '''
|
||||
return ((pose1['pose'][3]-pose2['pose'][3])**2\
|
||||
+ (pose1['pose'][7]-pose2['pose'][7])**2\
|
||||
+ (pose1['pose'][11]-pose2['pose'][11])**2)**0.5
|
||||
|
||||
graphs = {}
|
||||
for scan in scans:
|
||||
with open(os.path.join(connectivity_dir, '%s_connectivity.json' % scan)) as f:
|
||||
G = nx.Graph()
|
||||
positions = {}
|
||||
data = json.load(f)
|
||||
for i,item in enumerate(data):
|
||||
if item['included']:
|
||||
for j,conn in enumerate(item['unobstructed']):
|
||||
if conn and data[j]['included']:
|
||||
positions[item['image_id']] = np.array([item['pose'][3],
|
||||
item['pose'][7], item['pose'][11]]);
|
||||
assert data[j]['unobstructed'][i], 'Graph should be undirected'
|
||||
G.add_edge(item['image_id'],data[j]['image_id'],weight=distance(item,data[j]))
|
||||
nx.set_node_attributes(G, values=positions, name='position')
|
||||
graphs[scan] = G
|
||||
return graphs
|
||||
|
||||
def new_simulator(connectivity_dir, scan_data_dir=None):
|
||||
import MatterSim
|
||||
|
||||
# Simulator image parameters
|
||||
WIDTH = 640
|
||||
HEIGHT = 480
|
||||
VFOV = 60
|
||||
|
||||
sim = MatterSim.Simulator()
|
||||
if scan_data_dir:
|
||||
sim.setDatasetPath(scan_data_dir)
|
||||
sim.setNavGraphPath(connectivity_dir)
|
||||
sim.setRenderingEnabled(False)
|
||||
sim.setCameraResolution(WIDTH, HEIGHT)
|
||||
sim.setCameraVFOV(math.radians(VFOV))
|
||||
sim.setDiscretizedViewingAngles(True)
|
||||
sim.setBatchSize(1)
|
||||
sim.initialize()
|
||||
|
||||
return sim
|
||||
|
||||
def angle_feature(heading, elevation, angle_feat_size):
|
||||
return np.array(
|
||||
[math.sin(heading), math.cos(heading), math.sin(elevation), math.cos(elevation)] * (angle_feat_size // 4),
|
||||
dtype=np.float32)
|
||||
|
||||
def get_point_angle_feature(sim, angle_feat_size, baseViewId=0):
|
||||
feature = np.empty((36, angle_feat_size), np.float32)
|
||||
base_heading = (baseViewId % 12) * math.radians(30)
|
||||
base_elevation = (baseViewId // 12 - 1) * math.radians(30)
|
||||
|
||||
for ix in range(36):
|
||||
if ix == 0:
|
||||
sim.newEpisode(['ZMojNkEp431'], ['2f4d90acd4024c269fb0efe49a8ac540'], [0], [math.radians(-30)])
|
||||
elif ix % 12 == 0:
|
||||
sim.makeAction([0], [1.0], [1.0])
|
||||
else:
|
||||
sim.makeAction([0], [1.0], [0])
|
||||
|
||||
state = sim.getState()[0]
|
||||
assert state.viewIndex == ix
|
||||
|
||||
heading = state.heading - base_heading
|
||||
elevation = state.elevation - base_elevation
|
||||
|
||||
feature[ix, :] = angle_feature(heading, elevation, angle_feat_size)
|
||||
return feature
|
||||
|
||||
def get_all_point_angle_feature(sim, angle_feat_size):
|
||||
return [get_point_angle_feature(sim, angle_feat_size, baseViewId) for baseViewId in range(36)]
|
||||
|
||||
164
nav_src/utils/distributed.py
Normal file
164
nav_src/utils/distributed.py
Normal file
@ -0,0 +1,164 @@
|
||||
"""
|
||||
Distributed tools
|
||||
"""
|
||||
import os
|
||||
from pathlib import Path
|
||||
from pprint import pformat
|
||||
import pickle
|
||||
|
||||
import torch
|
||||
import torch.distributed as dist
|
||||
|
||||
|
||||
def load_init_param(opts):
|
||||
"""
|
||||
Load parameters for the rendezvous distributed procedure
|
||||
"""
|
||||
# sync file
|
||||
if opts.output_dir != "":
|
||||
sync_dir = Path(opts.output_dir).resolve()
|
||||
sync_dir.mkdir(parents=True, exist_ok=True)
|
||||
sync_file = f"{sync_dir}/.torch_distributed_sync"
|
||||
else:
|
||||
raise RuntimeError("Can't find any sync dir")
|
||||
|
||||
# world size
|
||||
if opts.world_size != -1:
|
||||
world_size = opts.world_size
|
||||
elif os.environ.get("WORLD_SIZE", "") != "":
|
||||
world_size = int(os.environ["WORLD_SIZE"])
|
||||
else:
|
||||
raise RuntimeError("Can't find any world size")
|
||||
|
||||
# rank
|
||||
if os.environ.get("RANK", "") != "":
|
||||
# pytorch.distributed.launch provide this variable no matter what
|
||||
rank = int(os.environ["RANK"])
|
||||
else:
|
||||
if opts.node_rank != -1:
|
||||
node_rank = opts.node_rank
|
||||
elif os.environ.get("NODE_RANK", "") != "":
|
||||
node_rank = int(os.environ["NODE_RANK"])
|
||||
else:
|
||||
raise RuntimeError("Can't find any rank or node rank")
|
||||
|
||||
if opts.local_rank != -1:
|
||||
local_rank = opts.local_rank
|
||||
elif os.environ.get("LOCAL_RANK", "") != "":
|
||||
local_rank = int(os.environ["LOCAL_RANK"])
|
||||
else:
|
||||
raise RuntimeError("Can't find any rank or local rank")
|
||||
|
||||
# WARNING: this assumes that each node has the same number of GPUs
|
||||
n_gpus = torch.cuda.device_count()
|
||||
rank = local_rank + node_rank * n_gpus
|
||||
|
||||
return {
|
||||
"backend": "nccl",
|
||||
"init_method": f"file://{sync_file}",
|
||||
"rank": rank,
|
||||
"world_size": world_size,
|
||||
}
|
||||
|
||||
|
||||
def init_distributed(opts):
|
||||
init_param = load_init_param(opts)
|
||||
rank = init_param["rank"]
|
||||
|
||||
print(f"Init distributed {init_param['rank']} - {init_param['world_size']}")
|
||||
|
||||
dist.init_process_group(**init_param)
|
||||
return rank
|
||||
|
||||
|
||||
def is_default_gpu(opts) -> bool:
|
||||
return opts.local_rank == -1 or dist.get_rank() == 0
|
||||
|
||||
|
||||
def is_dist_avail_and_initialized():
|
||||
if not dist.is_available():
|
||||
return False
|
||||
if not dist.is_initialized():
|
||||
return False
|
||||
return True
|
||||
|
||||
def get_world_size():
|
||||
if not is_dist_avail_and_initialized():
|
||||
return 1
|
||||
return dist.get_world_size()
|
||||
|
||||
def all_gather(data):
|
||||
"""
|
||||
Run all_gather on arbitrary picklable data (not necessarily tensors)
|
||||
Args:
|
||||
data: any picklable object
|
||||
Returns:
|
||||
list[data]: list of data gathered from each rank
|
||||
"""
|
||||
world_size = get_world_size()
|
||||
if world_size == 1:
|
||||
return [data]
|
||||
|
||||
# serialized to a Tensor
|
||||
buffer = pickle.dumps(data)
|
||||
storage = torch.ByteStorage.from_buffer(buffer)
|
||||
tensor = torch.ByteTensor(storage).to("cuda")
|
||||
|
||||
# obtain Tensor size of each rank
|
||||
local_size = torch.tensor([tensor.numel()], device="cuda")
|
||||
size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
|
||||
dist.all_gather(size_list, local_size)
|
||||
size_list = [int(size.item()) for size in size_list]
|
||||
max_size = max(size_list)
|
||||
|
||||
# receiving Tensor from all ranks
|
||||
# we pad the tensor because torch all_gather does not support
|
||||
# gathering tensors of different shapes
|
||||
tensor_list = []
|
||||
for _ in size_list:
|
||||
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
|
||||
if local_size != max_size:
|
||||
padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
|
||||
tensor = torch.cat((tensor, padding), dim=0)
|
||||
dist.all_gather(tensor_list, tensor)
|
||||
|
||||
data_list = []
|
||||
for size, tensor in zip(size_list, tensor_list):
|
||||
buffer = tensor.cpu().numpy().tobytes()[:size]
|
||||
data_list.append(pickle.loads(buffer))
|
||||
|
||||
return data_list
|
||||
|
||||
|
||||
def reduce_dict(input_dict, average=True):
|
||||
"""
|
||||
Args:
|
||||
input_dict (dict): all the values will be reduced
|
||||
average (bool): whether to do average or sum
|
||||
Reduce the values in the dictionary from all processes so that all processes
|
||||
have the averaged results. Returns a dict with the same fields as
|
||||
input_dict, after reduction.
|
||||
"""
|
||||
world_size = get_world_size()
|
||||
if world_size < 2:
|
||||
return input_dict
|
||||
with torch.no_grad():
|
||||
names = []
|
||||
values = []
|
||||
# sort the keys so that they are consistent across processes
|
||||
for k in sorted(input_dict.keys()):
|
||||
names.append(k)
|
||||
values.append(input_dict[k])
|
||||
values = torch.stack(values, dim=0)
|
||||
dist.all_reduce(values)
|
||||
if average:
|
||||
values /= world_size
|
||||
reduced_dict = {k: v for k, v in zip(names, values)}
|
||||
return reduced_dict
|
||||
|
||||
|
||||
def merge_dist_results(results):
|
||||
outs = []
|
||||
for res in results:
|
||||
outs.extend(res)
|
||||
return outs
|
||||
210
nav_src/utils/graph_utils.py
Normal file
210
nav_src/utils/graph_utils.py
Normal file
@ -0,0 +1,210 @@
|
||||
from collections import defaultdict, deque
|
||||
import numpy as np
|
||||
|
||||
MAX_DIST = 30
|
||||
MAX_STEP = 10
|
||||
|
||||
def calc_position_distance(a, b):
|
||||
# a, b: (x, y, z)
|
||||
dx = b[0] - a[0]
|
||||
dy = b[1] - a[1]
|
||||
dz = b[2] - a[2]
|
||||
dist = np.sqrt(dx**2 + dy**2 + dz**2)
|
||||
return dist
|
||||
|
||||
def calculate_vp_rel_pos_fts(a, b, base_heading=0, base_elevation=0):
|
||||
# a, b: (x, y, z)
|
||||
dx = b[0] - a[0]
|
||||
dy = b[1] - a[1]
|
||||
dz = b[2] - a[2]
|
||||
xy_dist = max(np.sqrt(dx**2 + dy**2), 1e-8)
|
||||
xyz_dist = max(np.sqrt(dx**2 + dy**2 + dz**2), 1e-8)
|
||||
|
||||
# the simulator's api is weired (x-y axis is transposed)
|
||||
heading = np.arcsin(dx/xy_dist) # [-pi/2, pi/2]
|
||||
if b[1] < a[1]:
|
||||
heading = np.pi - heading
|
||||
heading -= base_heading
|
||||
|
||||
elevation = np.arcsin(dz/xyz_dist) # [-pi/2, pi/2]
|
||||
elevation -= base_elevation
|
||||
|
||||
return heading, elevation, xyz_dist
|
||||
|
||||
def get_angle_fts(headings, elevations, angle_feat_size):
|
||||
ang_fts = [np.sin(headings), np.cos(headings), np.sin(elevations), np.cos(elevations)]
|
||||
ang_fts = np.vstack(ang_fts).transpose().astype(np.float32)
|
||||
num_repeats = angle_feat_size // 4
|
||||
if num_repeats > 1:
|
||||
ang_fts = np.concatenate([ang_fts] * num_repeats, 1)
|
||||
return ang_fts
|
||||
|
||||
|
||||
class FloydGraph(object):
|
||||
def __init__(self):
|
||||
self._dis = defaultdict(lambda :defaultdict(lambda: 95959595))
|
||||
self._point = defaultdict(lambda :defaultdict(lambda: ""))
|
||||
self._visited = set()
|
||||
|
||||
def distance(self, x, y):
|
||||
if x == y:
|
||||
return 0
|
||||
else:
|
||||
return self._dis[x][y]
|
||||
|
||||
def add_edge(self, x, y, dis):
|
||||
if dis < self._dis[x][y]:
|
||||
self._dis[x][y] = dis
|
||||
self._dis[y][x] = dis
|
||||
self._point[x][y] = ""
|
||||
self._point[y][x] = ""
|
||||
|
||||
def update(self, k):
|
||||
for x in self._dis:
|
||||
for y in self._dis:
|
||||
if x != y:
|
||||
if self._dis[x][k] + self._dis[k][y] < self._dis[x][y]:
|
||||
self._dis[x][y] = self._dis[x][k] + self._dis[k][y]
|
||||
self._dis[y][x] = self._dis[x][y]
|
||||
self._point[x][y] = k
|
||||
self._point[y][x] = k
|
||||
self._visited.add(k)
|
||||
|
||||
def visited(self, k):
|
||||
return (k in self._visited)
|
||||
|
||||
def path(self, x, y):
|
||||
"""
|
||||
:param x: start
|
||||
:param y: end
|
||||
:return: the path from x to y [v1, v2, ..., v_n, y]
|
||||
"""
|
||||
if x == y:
|
||||
return []
|
||||
if self._point[x][y] == "": # Direct edge
|
||||
return [y]
|
||||
else:
|
||||
k = self._point[x][y]
|
||||
# print(x, y, k)
|
||||
# for x1 in (x, k, y):
|
||||
# for x2 in (x, k, y):
|
||||
# print(x1, x2, "%.4f" % self._dis[x1][x2])
|
||||
return self.path(x, k) + self.path(k, y)
|
||||
|
||||
|
||||
class GraphMap(object):
|
||||
def __init__(self, start_vp):
|
||||
self.start_vp = start_vp # start viewpoint
|
||||
|
||||
self.node_positions = {} # viewpoint to position (x, y, z)
|
||||
self.graph = FloydGraph() # shortest path graph
|
||||
self.node_embeds = {} # {viewpoint: feature (sum feature, count)}
|
||||
self.node_stop_scores = {} # {viewpoint: prob}
|
||||
self.node_nav_scores = {} # {viewpoint: {t: prob}}
|
||||
self.node_step_ids = {}
|
||||
|
||||
def update_graph(self, ob):
|
||||
self.node_positions[ob['viewpoint']] = ob['position']
|
||||
for cc in ob['candidate']:
|
||||
self.node_positions[cc['viewpointId']] = cc['position']
|
||||
dist = calc_position_distance(ob['position'], cc['position'])
|
||||
self.graph.add_edge(ob['viewpoint'], cc['viewpointId'], dist)
|
||||
self.graph.update(ob['viewpoint'])
|
||||
|
||||
def update_node_embed(self, vp, embed, rewrite=False):
|
||||
if rewrite:
|
||||
self.node_embeds[vp] = [embed, 1]
|
||||
else:
|
||||
if vp in self.node_embeds:
|
||||
self.node_embeds[vp][0] += embed
|
||||
self.node_embeds[vp][1] += 1
|
||||
else:
|
||||
self.node_embeds[vp] = [embed, 1]
|
||||
|
||||
def get_node_embed(self, vp):
|
||||
return self.node_embeds[vp][0] / self.node_embeds[vp][1]
|
||||
|
||||
def get_pos_fts(self, cur_vp, gmap_vpids, cur_heading, cur_elevation, angle_feat_size=4):
|
||||
# dim=7 (sin(heading), cos(heading), sin(elevation), cos(elevation),
|
||||
# line_dist, shortest_dist, shortest_step)
|
||||
rel_angles, rel_dists = [], []
|
||||
for vp in gmap_vpids:
|
||||
if vp is None:
|
||||
rel_angles.append([0, 0])
|
||||
rel_dists.append([0, 0, 0])
|
||||
else:
|
||||
rel_heading, rel_elevation, rel_dist = calculate_vp_rel_pos_fts(
|
||||
self.node_positions[cur_vp], self.node_positions[vp],
|
||||
base_heading=cur_heading, base_elevation=cur_elevation,
|
||||
)
|
||||
rel_angles.append([rel_heading, rel_elevation])
|
||||
rel_dists.append(
|
||||
[rel_dist / MAX_DIST, self.graph.distance(cur_vp, vp) / MAX_DIST, \
|
||||
len(self.graph.path(cur_vp, vp)) / MAX_STEP]
|
||||
)
|
||||
rel_angles = np.array(rel_angles).astype(np.float32)
|
||||
rel_dists = np.array(rel_dists).astype(np.float32)
|
||||
rel_ang_fts = get_angle_fts(rel_angles[:, 0], rel_angles[:, 1], angle_feat_size)
|
||||
return np.concatenate([rel_ang_fts, rel_dists], 1)
|
||||
|
||||
def save_to_json(self):
|
||||
nodes = {}
|
||||
for vp, pos in self.node_positions.items():
|
||||
nodes[vp] = {
|
||||
'location': pos, # (x, y, z)
|
||||
'visited': self.graph.visited(vp),
|
||||
}
|
||||
if nodes[vp]['visited']:
|
||||
nodes[vp]['stop_prob'] = self.node_stop_scores[vp]['stop']
|
||||
nodes[vp]['og_objid'] = self.node_stop_scores[vp]['og']
|
||||
else:
|
||||
nodes[vp]['nav_prob'] = self.node_nav_scores[vp]
|
||||
|
||||
edges = []
|
||||
for k, v in self.graph._dis.items():
|
||||
for kk in v.keys():
|
||||
edges.append((k, kk))
|
||||
|
||||
return {'nodes': nodes, 'edges': edges}
|
||||
|
||||
|
||||
class NavGraph:
|
||||
def __init__(self):
|
||||
self.graph = defaultdict(list)
|
||||
|
||||
def add_node(self, node):
|
||||
if node not in self.graph:
|
||||
self.graph[node] = []
|
||||
|
||||
def update_connection(self, node1, node2):
|
||||
self.add_node(node1)
|
||||
self.add_node(node2)
|
||||
if node2 in self.graph[node1]:
|
||||
return None
|
||||
self.graph[node1].append(node2)
|
||||
self.graph[node2].append(node1)
|
||||
|
||||
def bfs_shortest_path(self, start, end):
|
||||
if start not in self.graph or end not in self.graph:
|
||||
return None
|
||||
|
||||
visited = {start: None}
|
||||
queue = deque([start])
|
||||
|
||||
while queue:
|
||||
current_node = queue.popleft()
|
||||
|
||||
if current_node == end:
|
||||
path = []
|
||||
while current_node is not None:
|
||||
path.append(current_node)
|
||||
current_node = visited[current_node]
|
||||
return path[::-1]
|
||||
|
||||
for neighbor in self.graph[current_node]:
|
||||
if neighbor not in visited:
|
||||
visited[neighbor] = current_node
|
||||
queue.append(neighbor)
|
||||
|
||||
return None
|
||||
|
||||
80
nav_src/utils/logger.py
Normal file
80
nav_src/utils/logger.py
Normal file
@ -0,0 +1,80 @@
|
||||
import os
|
||||
import sys
|
||||
import math
|
||||
import time
|
||||
from collections import OrderedDict
|
||||
|
||||
|
||||
def write_to_record_file(data, file_path, verbose=True):
|
||||
if verbose:
|
||||
print(data)
|
||||
record_file = open(file_path, 'a')
|
||||
record_file.write(data+'\n')
|
||||
record_file.close()
|
||||
|
||||
|
||||
def asMinutes(s):
|
||||
m = math.floor(s / 60)
|
||||
s -= m * 60
|
||||
return '%dm %ds' % (m, s)
|
||||
|
||||
def timeSince(since, percent):
|
||||
now = time.time()
|
||||
s = now - since
|
||||
es = s / (percent)
|
||||
rs = es - s
|
||||
return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
|
||||
|
||||
class Timer:
|
||||
def __init__(self):
|
||||
self.cul = OrderedDict()
|
||||
self.start = {}
|
||||
self.iter = 0
|
||||
|
||||
def reset(self):
|
||||
self.cul = OrderedDict()
|
||||
self.start = {}
|
||||
self.iter = 0
|
||||
|
||||
def tic(self, key):
|
||||
self.start[key] = time.time()
|
||||
|
||||
def toc(self, key):
|
||||
delta = time.time() - self.start[key]
|
||||
if key not in self.cul:
|
||||
self.cul[key] = delta
|
||||
else:
|
||||
self.cul[key] += delta
|
||||
|
||||
def step(self):
|
||||
self.iter += 1
|
||||
|
||||
def show(self):
|
||||
total = sum(self.cul.values())
|
||||
for key in self.cul:
|
||||
print("%s, total time %0.2f, avg time %0.2f, part of %0.2f" %
|
||||
(key, self.cul[key], self.cul[key]*1./self.iter, self.cul[key]*1./total))
|
||||
print(total / self.iter)
|
||||
|
||||
|
||||
def print_progress(iteration, total, prefix='', suffix='', decimals=1, bar_length=100):
|
||||
"""
|
||||
Call in a loop to create terminal progress bar
|
||||
@params:
|
||||
iteration - Required : current iteration (Int)
|
||||
total - Required : total iterations (Int)
|
||||
prefix - Optional : prefix string (Str)
|
||||
suffix - Optional : suffix string (Str)
|
||||
decimals - Optional : positive number of decimals in percent complete (Int)
|
||||
bar_length - Optional : character length of bar (Int)
|
||||
"""
|
||||
str_format = "{0:." + str(decimals) + "f}"
|
||||
percents = str_format.format(100 * (iteration / float(total)))
|
||||
filled_length = int(round(bar_length * iteration / float(total)))
|
||||
bar = '█' * filled_length + '-' * (bar_length - filled_length)
|
||||
|
||||
sys.stdout.write('\r%s |%s| %s%s %s' % (prefix, bar, percents, '%', suffix)),
|
||||
|
||||
if iteration == total:
|
||||
sys.stdout.write('\n')
|
||||
sys.stdout.flush()
|
||||
17
nav_src/utils/misc.py
Normal file
17
nav_src/utils/misc.py
Normal file
@ -0,0 +1,17 @@
|
||||
import random
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
def set_random_seed(seed):
|
||||
torch.manual_seed(seed)
|
||||
torch.cuda.manual_seed(seed)
|
||||
torch.cuda.manual_seed_all(seed)
|
||||
random.seed(seed)
|
||||
np.random.seed(seed)
|
||||
|
||||
def length2mask(length, size=None):
|
||||
batch_size = len(length)
|
||||
size = int(max(length)) if size is None else size
|
||||
mask = (torch.arange(size, dtype=torch.int64).unsqueeze(0).repeat(batch_size, 1)
|
||||
> (torch.LongTensor(length) - 1).unsqueeze(1)).cuda()
|
||||
return mask
|
||||
38
nav_src/utils/ops.py
Normal file
38
nav_src/utils/ops.py
Normal file
@ -0,0 +1,38 @@
|
||||
import numpy as np
|
||||
import torch
|
||||
|
||||
def pad_tensors(tensors, lens=None, pad=0):
|
||||
"""B x [T, ...]"""
|
||||
if lens is None:
|
||||
lens = [t.size(0) for t in tensors]
|
||||
max_len = max(lens)
|
||||
bs = len(tensors)
|
||||
hid = list(tensors[0].size()[1:])
|
||||
size = [bs, max_len] + hid
|
||||
|
||||
dtype = tensors[0].dtype
|
||||
device = tensors[0].device
|
||||
output = torch.zeros(*size, dtype=dtype).to(device)
|
||||
if pad:
|
||||
output.data.fill_(pad)
|
||||
for i, (t, l) in enumerate(zip(tensors, lens)):
|
||||
output.data[i, :l, ...] = t.data
|
||||
return output
|
||||
|
||||
def gen_seq_masks(seq_lens, max_len=None):
|
||||
if max_len is None:
|
||||
max_len = max(seq_lens)
|
||||
|
||||
if isinstance(seq_lens, torch.Tensor):
|
||||
device = seq_lens.device
|
||||
masks = torch.arange(max_len).to(device).repeat(len(seq_lens), 1) < seq_lens.unsqueeze(1)
|
||||
return masks
|
||||
|
||||
if max_len == 0:
|
||||
return np.zeros((len(seq_lens), 0), dtype=np.bool)
|
||||
|
||||
seq_lens = np.array(seq_lens)
|
||||
batch_size = len(seq_lens)
|
||||
masks = np.arange(max_len).reshape(-1, max_len).repeat(batch_size, 0)
|
||||
masks = masks < seq_lens.reshape(-1, 1)
|
||||
return masks
|
||||
5
requirements.txt
Normal file
5
requirements.txt
Normal file
@ -0,0 +1,5 @@
|
||||
langchain==0.0.246
|
||||
numpy
|
||||
openai
|
||||
transformers
|
||||
networkx
|
||||
Loading…
Reference in New Issue
Block a user