Adapt to higher Langchain version

This commit is contained in:
Gengze-aiml 2023-10-20 03:41:33 +10:30
commit 62cc22fd38
30 changed files with 2877 additions and 0 deletions

13
.gitignore vendored Normal file
View File

@ -0,0 +1,13 @@
.ftpignore
.ftpconfig
.vscode
# Byte-compiled / optimized / DLL files
.ipynb_checkpoints/
__pycache__/
*.py[cod]
*$py.class
*.swp
datasets/*
!datasets/.gitkeep

3
.gitmodules vendored Normal file
View File

@ -0,0 +1,3 @@
[submodule "nav_src/LLMs/llama"]
path = nav_src/LLMs/llama
url = https://github.com/facebookresearch/llama.git

21
LICENSE Normal file
View File

@ -0,0 +1,21 @@
The MIT License (MIT)
Copyright (c) 2023 Gengze Zhou, Yicong Hong, Qi Wu
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

127
README.md Normal file
View File

@ -0,0 +1,127 @@
<div align="center">
<h1>🎇NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models</h1>
<div>
<a href='https://github.com/GengzeZhou' target='_blank'>Gengze Zhou<sup>🍕</sup><sup>🍔</sup></a>;
<a href='http://www.yiconghong.me' target='_blank'>Yicong Hong<sup>🌭</sup></a>;
<a href='http://www.qi-wu.me' target='_blank'>Qi Wu<sup>🍕</sup><sup>🍔</sup></a>
</div>
<sup>🍕</sup>The University of Adelaide <sup>🍔</sup>Australian Institude for Machine Learning <sup>🌭</sup>The Australian National University
<br>
<div>
<a href='https://github.com/GengzeZhou/NavGPT' target='_blank'><img alt="Static Badge" src="https://img.shields.io/badge/NavGPT-v0.1-blue"></a>
<a href='https://arxiv.org/abs/2305.16986' target='_blank'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
<a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT"></a>
<a href="https://github.com/langchain-ai/langchain"><img alt="Static Badge" src="https://img.shields.io/badge/🦜️🔗-Langchain-green"></a>
</div>
</div>
## 🍹 Abstract
Trained with an unprecedented scale of data, large language models (LLMs) like ChatGPT and GPT-4 exhibit the emergence of significant reasoning abilities from model scaling. Such a trend underscored the potential of training LLMs with unlimited language data, advancing the development of a universal embodied agent.
In this work, we introduce the NavGPT, a purely LLM-based instruction-following navigation agent, to reveal the reasoning capability of GPT models in complex embodied scenes by performing zero-shot sequential action prediction for vision-and-language navigation (VLN).
At each step, NavGPT takes the textual descriptions of visual observations, navigation history, and future explorable directions as inputs to reason the agent's current status, and makes the decision to approach the target.
Through comprehensive experiments, we demonstrate NavGPT can explicitly perform high-level planning for navigation, including decomposing instruction into sub-goal, integrating commonsense knowledge relevant to navigation task resolution, identifying landmarks from observed scenes, tracking navigation progress, and adapting to exceptions with plan adjustment.
Furthermore, we show that LLMs is capable of generating high-quality navigational instructions from observations and actions along a path, as well as drawing accurate top-down metric trajectory given the agent's navigation history. Despite the performance of using NavGPT to zero-shot R2R tasks still falling short of trained models, we suggest adapting multi-modality inputs for LLMs to use as visual navigation agents and applying the explicit reasoning of LLMs to benefit learning-based models.
## 🍸 Method
![](assets/NavGPT.png)
## 🍻 TODOs
- [x] Release 🎇NavGPT code.
- [x] Data preprocessing code.
- [x] Custuomized LLM inference guidance.
## 🧋 Prerequisites
### 🍭 Installation
Create a conda environment and install all dependencies:
```bash
conda create --name NavGPT python=3.9
conda activate NavGPT
pip install -r requirements.txt
```
### 🍬 Data Preparation
Download R2R data from [Dropbox](https://www.dropbox.com/sh/i8ng3iq5kpa68nu/AAB53bvCFY_ihYx1mkLlOB-ea?dl=1). Put the data in `datasets` directory.
Related data preprocessing code can be found in `nav_src/scripts`.
### 🍫 OpenAi API
Get an [OpenAI API Key](https://platform.openai.com/account/api-keys) and add to your environment variables:
```bash
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}
```
Alternatively, you can set the key in your code:
```python
import os
os.environ["OPENAI_API_KEY"] = {Your_Private_Openai_Key}
```
## 🍷 R2R Navigation
### 🍴 Reproduce Validation Results
To replicate the performance reported in our paper, use GPT-4 and run validation with following configuration:
```bash
cd nav_src
python NavGPT.py --llm_model_name gpt-4 \
--output_dir ../datasets/R2R/exprs/gpt-4-val-unseen \
--val_env_name R2R_val_unseen_instr
```
Results will be saved in `datasets/R2R/exprs/gpt-4-val-unseen` directory.
The defualt `--llm_model_name` is set as `gpt-3.5-turbo`.
An economic way to try 🎇NavGPT is by using GPT-3.5 and run validation on the first 10 samples with following configuration:
```bash
cd nav_src
python NavGPT.py --llm_model_name gpt-3.5-turbo \
--output_dir ../datasets/R2R/exprs/gpt-3.5-turbo-test \
--val_env_name R2R_val_unseen_instr \
--iters 10
```
### 🥢 Set up Custom LLMs for 🎇NavGPT
Add your own model repo as a submodule under `nav_src/LLMs/`:
```bash
cd nav_src/LLMs
git submodule add {Your_Model_Repo}
```
or just copy your local inference code under `nav_src/LLMs/`.
Follow the [instructions](nav_src/LLMs/Add_Custom_Models.md) to set up your own LLMs for 🎇NavGPT.
Run 🎇NavGPT with your custom LLM:
```bash
cd nav_src
python NavGPT.py --llm_model_name your_custom_llm \
--output_dir ../datasets/R2R/exprs/your_custom_llm-test
```
## 🧃 Citation
If 🎇`NavGPT` has been beneficial to your research and work, please cite our work using the following format:
```
@article{zhou2023navgpt,
title={NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models},
author={Zhou, Gengze and Hong, Yicong and Wu, Qi},
journal={arXiv preprint arXiv:2305.16986},
year={2023}
}
```

BIN
assets/NavGPT.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.0 MiB

BIN
assets/obs.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.3 MiB

0
datasets/.gitkeep Normal file
View File

View File

@ -0,0 +1,148 @@
## Add Custom LLMs for NavGPT
## Contents
- [Set up built-in integrations with LLM providers](#set-up-built-in-integrations-with-llm-providers)
- [Set up local model inference](#set-up-local-model-inference)
- [Step 1: Set up the model environment](#step-1-set-up-the-model-environment)
- [Step 2: Set up the inference pipeline](#step-2-set-up-the-inference-pipeline)
- [Step 3: Register the custom LLM](#step-3-register-the-custom-llm)
- [Step 4: Run NavGPT with the custom LLM](#step-4-run-navgpt-with-the-custom-llm)
## Set up built-in integrations with LLM providers
The `Langchain` package has integrated various cloud services which provide LLMs inference APIs ([OpenAI](https://openai.com/), [Cohere](https://cohere.ai/), [Hugging Face](https://huggingface.co/), etc). You can use these services directly by setting up the API keys.
You can also check out the [Langchain Integrations Documentations](https://python.langchain.com/docs/integrations/llms/) for more information.
## Set up local model inference
One possible way to set up local inference is through [Hugging Face Loacal Pipelines](https://python.langchain.com/docs/integrations/llms/huggingface_pipelines) in Langchain.
However, to maximize the degree of freedom of running local inference or setting up your custum LLMs, we recommend you to set up your own inference pipeline. We provide an example of `nav_src/LLMs/Langchain_llama.py` to show how to set up a local inference pipeline.
You can check out the [Langchain Custom LLM](https://python.langchain.com/docs/modules/model_io/models/llms/custom_llm) for more information.
We will use Llama-2 as an example to show how to set up a local inference pipeline.
### Step 1: Set up the model environment
Add the Llama-2 repo as a submodule under `nav_src/LLMs/`:
```bash
cd nav_src/LLMs
git submodule add https://github.com/facebookresearch/llama.git
```
Because we have already set up the `nav_src/LLMs/llama` as a submodule, you can skip the previous step, initialize and clone the submodule by:
```bash
git submodule update --init --recursive
```
Download the [Llama-2 weights](https://ai.meta.com/resources/models-and-libraries/llama-downloads/) accroding to the [instructions](https://github.com/facebookresearch/llama) and set up the Llama-2 environment:
```bash
cd llama
pip install -e .
```
### Step 2: Set up the inference pipeline
Create your own LLM class `Custom_model` under `nav_src/LLMs/Langchain_model.py`:
There is only one required `_call` function that a custom LLM needs to implement, for example:
```python
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
**kwargs: Any,
) -> str:
if stop is not None:
raise ValueError("stop kwargs are not permitted.")
result = self.model.generate(
prompt,
max_length=self.max_length,
num_beams=self.num_beams,
temperature=self.temperature,
top_k=self.top_k,
top_p=self.top_p,
repetition_penalty=self.repetition_penalty,
do_sample=self.do_sample,
num_return_sequences=self.num_return_sequences,
**kwargs,
)
return result
```
An optional `_identifying_params` property can be rewrited to help with printing of this class. Should return a dictionary.
```python
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
"model_name": self.model_name,
"max_length": self.max_length,
"num_beams": self.num_beams,
"temperature": self.temperature,
"top_k": self.top_k,
"top_p": self.top_p,
"repetition_penalty": self.repetition_penalty,
"do_sample": self.do_sample,
"num_return_sequences": self.num_return_sequences,
}
```
If your custom LLM needs to be initialized with some parameters, you can write your own `from_config` or `from_model_id` classmethod. Check out the example in `nav_src/LLMs/Langchain_llama.py` for more information.
Here is an example of running our custom Llama-2 locally as a LLMChain in Langchain:
```python
>>> from langchain import PromptTemplate, LLMChain
>>> from nav_src.LLMs.Langchain_llama import Custom_Llama
>>> ckpt_dir = "LLMs/llama/llama-2-13b"
>>> tokenizer_path = "LLMs/llama/tokenizer.model"
>>> llm = Custom_Llama.from_model_id(
temperature=0.75,
ckpt_dir = ckpt_dir,
tokenizer_path = tokenizer_path,
max_seq_len = 4000,
max_gen_len = 800,
max_batch_size = 4,
)
>>> template = """Question: {question}\nAnswer: Let's think step by step."""
>>> prompt = PromptTemplate(template=template, input_variables=["question"])
>>> llm_chain = LLMChain(prompt=prompt, llm=llm)
>>> question = "What is electroencephalography?"
>>> print(llm_chain.run(question))
"Sure, I'd be happy to help! Here's a step-by-step explanation of what electroencephalography (EEG) is:
1. Electroencephalography (EEG) is a non-invasive neuroimaging technique that measures the electrical activity of the brain.
2. The brain is made up of billions of neurons, which communicate with each other through electrical signals. EEG recordings measure these electrical signals, allowing researchers and clinicians to study the brain's activity.
3. To record EEG data, electrodes are placed on the scalp, usually in a specific pattern such as the International 10-20 system. These electrodes detect the electrical activity of the brain and transmit it to a computer for analysis.
4. The EEG signal is composed of different frequency bands, including alpha, beta, gamma, and theta waves. Each frequency band is associated with different cognitive processes, such as attention, relaxation, or memory.
5. EEG can be used to diagnose and monitor a variety of neurological conditions, such as epilepsy, sleep disorders, and stroke. It can also be used to assess brain function in patients with traumatic brain injury, coma, or vegetative state.
6. In addition to diagnostic applications, EEG is also used in research studies to investigate the neural mechanisms underlying various cognitive processes, such as language processing, memory formation, and decision-making.
7. EEG has several advantages over other neuroimaging techniques, such as functional magnetic resonance imaging (fMRI) or positron emission tomography (PET). For example, EEG is relatively inexpensive, portable, and can be performed in a clinical setting or at home. Additionally, EEG provides high temporal resolution, allowing researchers to study the dynamics of brain activity in real-time.
8. Overall, EEG is a valuable tool for understanding the workings of the human brain, diagnosing neurological conditions, and monitoring brain health. Its non-invasive nature and high temporal resolution make it an important technique in neuroscience research and clinical practice."
```
### Step 3: Register the custom LLM
In `nav_src/agent.py`, register the custom LLM by adding the following code after `line 176`:
```python
elif config.llm_model_name == 'your_custom_llm':
from LLMs.Langchain_model import Custom_model
self.llm = Custom_model.from_config(
config = config,
)
```
### Step 4: Run NavGPT with the custom LLM
Now you can run NavGPT with your custom LLM:
```bash
cd nav_src
python NavGPT.py --llm_model_name your_custom_llm \
--output_dir ../datasets/R2R/exprs/your_custom_llm-test
```

View File

@ -0,0 +1,85 @@
from typing import Any, List, Mapping, Optional
from langchain.callbacks.manager import CallbackManagerForLLMRun
from langchain.llms.base import LLM
from LLMs.llama.llama import Llama
class Custom_Llama(LLM):
model: Any #: :meta private:
"""Key word arguments passed to the model."""
ckpt_dir: str
tokenizer_path: str
temperature: float = 0.6
top_p: float = 0.9
max_seq_len: int = 128
max_gen_len: int = 64
max_batch_size: int = 4
@property
def _llm_type(self) -> str:
return "custom_llama"
@classmethod
def from_model_id(
cls,
ckpt_dir: str,
tokenizer_path: str,
temperature: float = 0.6,
top_p: float = 0.9,
max_seq_len: int = 128,
max_gen_len: int = 64,
max_batch_size: int = 4,
**kwargs: Any,
) -> LLM:
"""Construct the pipeline object from model_id and task."""
model = Llama.build(
ckpt_dir=ckpt_dir,
tokenizer_path=tokenizer_path,
max_seq_len=max_seq_len,
max_batch_size=max_batch_size,
)
return cls(
model = model,
ckpt_dir = ckpt_dir,
tokenizer_path = tokenizer_path,
# set as default
temperature = 0.6,
top_p = top_p,
max_seq_len = max_seq_len,
max_gen_len = max_gen_len,
max_batch_size = max_batch_size,
**kwargs,
)
def _call(
self,
prompt: str,
stop: Optional[List[str]] = None,
run_manager: Optional[CallbackManagerForLLMRun] = None,
) -> str:
# if stop is not None:
# raise ValueError("stop kwargs are not permitted.")
result = self.model.text_completion(
[prompt],
max_gen_len=self.max_gen_len,
temperature=self.temperature,
top_p=self.top_p,
)
return result[0]["generation"]
@property
def _identifying_params(self) -> Mapping[str, Any]:
"""Get the identifying parameters."""
return {
"ckpt_dir": self.ckpt_dir,
"tokenizer_path": self.tokenizer_path,
"temperature": self.temperature,
"top_p": self.top_p,
"max_seq_len": self.max_seq_len,
"max_gen_len": self.max_gen_len,
"max_batch_size": self.max_batch_size,
}

1
nav_src/LLMs/llama Submodule

@ -0,0 +1 @@
Subproject commit 06faf3aab2971e7931e3d5b41e53c4a614d5bad7

107
nav_src/NavGPT.py Normal file
View File

@ -0,0 +1,107 @@
import os
import json
import time
from data_utils import construct_instrs
from utils.logger import write_to_record_file
from utils.data import ImageObservationsDB
from parser import parse_args
from env import R2RNavBatch
from agent import NavAgent
def build_dataset(args):
feat_db = ImageObservationsDB(args.obs_dir, args.obs_summary_dir, args.obj_dir)
dataset_class = R2RNavBatch
val_env_names = [args.val_env_name]
val_envs = {}
for split in val_env_names:
val_instr_data = construct_instrs(
args.anno_dir, args.dataset, [split]
)
val_env = dataset_class(
feat_db, val_instr_data, args.connectivity_dir, args.navigable_dir,
batch_size=args.batch_size, seed=args.seed, name=split,
) # evaluation using all objects
val_envs[split] = val_env
return val_envs
def valid(args, val_envs):
agent = NavAgent(next(iter(val_envs.values())), args)
with open(os.path.join(args.log_dir, 'validation_args.json'), 'w') as outf:
json.dump(vars(args), outf, indent=4)
record_file = os.path.join(args.log_dir, 'valid.txt')
write_to_record_file(str(args) + '\n\n', record_file)
for env_name, env in val_envs.items():
prefix = 'submit'
if os.path.exists(os.path.join(args.pred_dir, "%s_%s.json" % (prefix, env_name))):
continue
agent.env = env
start_time = time.time()
agent.test(iters=args.iters)
print(env_name, 'cost time: %.2fs' % (time.time() - start_time))
# Get the results
preds = agent.get_results(detailed_output=False)
# Record llm output details
if args.detailed_output:
preds_detail = agent.get_results(detailed_output=True)
json.dump(
preds_detail,
open(os.path.join(args.log_dir, "detail_%s.json" % (env_name)), 'w'),
sort_keys=True, indent=4, separators=(',', ': ')
)
if 'test' not in env_name:
score_summary, _ = env.eval_metrics(preds)
loss_str = "Env name: %s" % env_name
for metric, val in score_summary.items():
loss_str += ', %s: %.2f' % (metric, val)
write_to_record_file(loss_str+'\n', record_file)
json.dump(
preds,
open(os.path.join(args.pred_dir, "%s_%s.json" % (prefix, env_name)), 'w'),
sort_keys=True, indent=4, separators=(',', ': ')
)
def valid_from_file(args, val_envs):
agent = NavAgent(next(iter(val_envs.values())), args)
with open(args.valid_file, 'r') as f:
preds = json.load(f)
for env_name, env in val_envs.items():
agent.env = env
valid_list = [preds]
for valid_pred in valid_list:
score_summary, _ = env.eval_metrics(valid_pred)
loss_str = "Env name: %s, length %d" % (env_name, len(valid_pred))
for metric, val in score_summary.items():
loss_str += ', %s: %.2f' % (metric, val)
print(loss_str)
def main():
args = parse_args()
val_envs = build_dataset(args)
if args.valid_file is not None:
valid_from_file(args, val_envs)
else:
valid(args, val_envs)
if __name__ == '__main__':
main()

728
nav_src/agent.py Normal file
View File

@ -0,0 +1,728 @@
"""Agent that interacts with Matterport3D simulator via a hierarchical planning approach."""
import json
import yaml
import re
import warnings
import numpy as np
from typing import Any, Callable, List, NamedTuple, Optional, Sequence, Tuple, Dict, Union
from env import R2RNavBatch
from argparse import Namespace
from agent_base import BaseAgent
from langchain import HuggingFacePipeline
from langchain.agents.agent import AgentExecutor, AgentAction, AgentOutputParser
from langchain.agents.mrkl.base import ZeroShotAgent
from langchain.agents.tools import Tool
from langchain.chains import LLMChain
from langchain.llms.openai import OpenAI
from langchain.prompts import PromptTemplate
from langchain.schema import (
AgentAction,
AgentFinish,
BaseMessage,
BaseOutputParser,
OutputParserException
)
from langchain.base_language import BaseLanguageModel
from langchain.agents.mrkl.prompt import FORMAT_INSTRUCTIONS
from prompt.planner_prompt import (
ACTION_PROMPT,
HISTORY_PROMPT,
PLANNER_PROMPT,
BACK_TRACE_PROMPT,
MAKE_ACTION_TOOL_NAME,
MAKE_ACTION_TOOL_DESCRIPTION,
BACK_TRACE_TOOL_NAME,
BACK_TRACE_TOOL_DESCRIPTION,
VLN_ORCHESTRATOR_PROMPT,
VLN_GPT4_PROMPT,
VLN_GPT35_PROMPT,
)
FINAL_ANSWER_ACTION = "Final Answer:"
EXCEPTION_TOOL_NAME = "_Exception"
MAX_SCRATCHPAD_LENGTH = 7000
MISSING_ACTION_AFTER_THOUGHT_ERROR_MESSAGE = (
"Invalid Format: Missing 'Action:' after 'Thought:"
)
MISSING_ACTION_INPUT_AFTER_ACTION_ERROR_MESSAGE = (
"Invalid Format: Missing 'Action Input:' after 'Action:'"
)
FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE = (
"Parsing LLM output produced both a final answer and a parse-able action:"
)
class NavGPTOutputParser(AgentOutputParser):
"""MRKL Output parser for the chat agent."""
def get_format_instructions(self) -> str:
return FORMAT_INSTRUCTIONS
def parse(self, text: str) -> Union[AgentAction, AgentFinish]:
includes_answer = FINAL_ANSWER_ACTION in text
regex = (
r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*\"?([a-fA-F0-9]{32})\"?"
)
action_match = re.search(regex, text, re.DOTALL)
if action_match:
if includes_answer:
raise OutputParserException(
f"{FINAL_ANSWER_AND_PARSABLE_ACTION_ERROR_MESSAGE}: {text}"
)
action = action_match.group(1).strip()
action_input = action_match.group(2)
tool_input = action_input.strip(" ")
# ensure if its a well formed SQL query we don't remove any trailing " chars
if tool_input.startswith("SELECT ") is False:
tool_input = tool_input.strip('"')
return AgentAction(action, tool_input, text)
elif includes_answer:
return AgentFinish(
{"output": text.split(FINAL_ANSWER_ACTION)[-1].strip()}, text
)
if not re.search(r"Action\s*\d*\s*:[\s]*(.*?)", text, re.DOTALL):
raise OutputParserException(
f"Could not parse LLM output: `{text}`",
observation=MISSING_ACTION_AFTER_THOUGHT_ERROR_MESSAGE,
llm_output=text,
send_to_llm=True,
)
elif not re.search(
r"[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*(.*)", text, re.DOTALL
):
raise OutputParserException(
f"Could not parse LLM output: `{text}`",
observation=MISSING_ACTION_INPUT_AFTER_ACTION_ERROR_MESSAGE,
llm_output=text,
send_to_llm=True,
)
else:
raise OutputParserException(f"Could not parse LLM output: `{text}`")
@property
def _type(self) -> str:
return "mrkl-NavGPT"
class VLNAgent(ZeroShotAgent):
history: Optional[List[str]] = None
def _construct_scratchpad(
self, intermediate_steps: List[Tuple[AgentAction, str]]
) -> Union[str, List[BaseMessage]]:
"""Construct the scratchpad that lets the agent continue its thought process."""
thoughts = ""
nav_step = 1
for i, (action, observation) in enumerate(intermediate_steps):
thoughts += action.log
if (i == len(intermediate_steps) - 1) or (action.tool != MAKE_ACTION_TOOL_NAME):
thoughts += f"\n{self.observation_prefix}{observation}\n{self.llm_prefix}"
else:
thoughts += f"\n{self.observation_prefix}{self.history[nav_step]}\n{self.llm_prefix}"
nav_step += 1
return thoughts
def get_full_inputs(
self, intermediate_steps: List[Tuple[AgentAction, str]], **kwargs: Any
) -> Dict[str, Any]:
"""Create the full inputs for the LLMChain from intermediate steps."""
thoughts = self._construct_scratchpad(intermediate_steps)[-MAX_SCRATCHPAD_LENGTH:]
new_inputs = {"agent_scratchpad": thoughts, "stop": self._stop}
if len(intermediate_steps) == 0:
full_inputs = {**kwargs, **new_inputs}
else:
kwargs["init_observation"] = self.history[0]
full_inputs = {**kwargs, **new_inputs}
return full_inputs
class NavAgent(BaseAgent):
def __init__(
self,
env: R2RNavBatch,
config: Namespace):
"""
Initialize the LLM Navigation Agent.
Args:
env: The Matterport3D environment.
config: The configuration.
"""
super().__init__(env)
self.config = config
if config.llm_model_name.split('-')[0] == 'gpt':
self.llm = OpenAI(
temperature=config.temperature,
model_name=config.llm_model_name,
)
elif config.llm_model_name == 'llama-2-13b':
from LLMs.Langchain_llama import Custom_Llama
ckpt_dir = "LLMs/llama/llama-2-13b"
tokenizer_path = "LLMs/llama/tokenizer.model"
self.llm = Custom_Llama.from_model_id(
temperature=config.temperature,
ckpt_dir = ckpt_dir,
tokenizer_path = tokenizer_path,
max_seq_len = 8000,
max_gen_len = 500,
max_batch_size = 1,
)
# elif config.llm_model_name == 'Vicuna-v1.5-13b':
# from LLMs.Langchain_Vicuna import Custom_Vicuna
# self.llm = Custom_Vicuna.from_config(
# config = config,
# )
# elif config.llm_model_name == 'FlanT5XXL':
# from LLMs.Langchain_FlanT5 import Custom_FlanT5
# self.llm = Custom_FlanT5.from_config(
# config = config,
# )
# elif config.llm_model_name == 'Emu-14B':
# from LLMs.Langchain_Emu import Custom_Emu
# self.llm = Custom_Emu.from_config(
# config = config,
# )
# else:
# from LLMs.Langchain_InstructBLIP import Custom_NavGPT_InstructBLIP
# self.llm = Custom_NavGPT.from_config(
# config = config,
# )
self.output_parser = NavGPTOutputParser()
self.agent_executor = self.create_vln_agent()
plan_prompt = PromptTemplate(
template=PLANNER_PROMPT,
input_variables=["instruction"],
)
self.plan_chain = LLMChain(llm=self.llm, prompt=plan_prompt)
def parse_action(self, llm_output: str) -> Tuple[str, str]:
regex = r"(.*?)Final Answer:[\s]*(.*)"
match = re.search(regex, llm_output, re.DOTALL)
if not match:
raise ValueError(f"Could not parse LLM output: `{llm_output}`")
thought = match.group(1).strip()
action = match.group(2).strip(" ").strip('"').strip("'")
return thought, action
def get_his_viewpoints(self) -> str:
'''Return the history of visited viewpoints for back tracing.'''
his_viewpoints = ''
# The last vp is not included in the history
for i, detail in enumerate(self.traj[0]['details'][:-1]):
viewpointID = detail['viewpointID']
viewpoint_ob = detail['feature']
his_viewpoints += f"Step {i+1}. Viewpoint ID '{viewpointID}':\n {viewpoint_ob}\n\n"
return his_viewpoints
def get_history(self, obs: dict, angle: str) -> str:
'''Return the history of actions taken.'''
history = f'{angle}\nCurrent viewpoint "{obs["viewpoint"]}": Scene from the viewpoint is a {obs["obs_summary"]}'
return history
def get_navigable_str(self, cur_heading: float, cur_elevation: float, navigable: dict) -> str:
'''Return the navigable viewpoints as a string.'''
navigable_str = ''
for vp, items in navigable.items():
heading = np.rad2deg(items['heading'])
elevation = np.rad2deg(items['elevation'])
distance = items['distance']
rel_heading = heading - cur_heading
rel_elevation = elevation - cur_elevation
if self.config.use_relative_angle:
navigable_str += f"'{vp}':\nheading: {rel_heading:.2f}, elevation: {rel_elevation:.2f}, distance: {distance:.2f}\n"
else:
navigable_str += f"'{vp}':\nheading: {heading:.2f}, elevation: {elevation:.2f}, distance: {distance:.2f}\n"
return navigable_str
def modify_heading_angles(self, heading_angle, observation_list, candidate_dict, object_list):
# Function to normalize an angle to the range of -180 to 180
def normalize_angle(angle):
while angle > 180:
angle -= 360
while angle <= -180:
angle += 360
return angle
def angle_to_left_right(angle):
return f"left {-angle:.2f}" if angle < 0 else f"right {angle:.2f}"
# Define the directions
directions = ['Front', 'Front Right', 'Right', 'Rear Right', 'Rear', 'Rear Left', 'Left', 'Front Left']
# Calculate the range of heading angles belonging to each direction
range_idx = int((heading_angle - 22.5) // 45) + 1
obs_idx = [(i + range_idx) % 8 for i in range(8)]
# Initialize a dictionary to store the candidate viewpoints for each direction
candidate_range = {}
if not self.config.use_navigable:
for viewpoint_id, viewpoint_data in candidate_dict.items():
viewpoint_heading = np.rad2deg(viewpoint_data['heading'])
vp_range_idx = int((viewpoint_heading - 22.5) // 45) + 1
rel_viewpoint_heading = viewpoint_heading - heading_angle
rel_viewpoint_heading = normalize_angle(rel_viewpoint_heading)
rel_viewpoint_heading = angle_to_left_right(rel_viewpoint_heading)
vp_description = rel_viewpoint_heading + f', {viewpoint_data["distance"]:.2f}m'
# rel_range_idx = (vp_range_idx - range_idx) % 8
candidate_range.setdefault(vp_range_idx, {}).update({viewpoint_id: vp_description})
# Calculate the relative angle ranges based on the heading angle
angle_ranges = [(angle - 22.5 - heading_angle, angle + 22.5 - heading_angle) for angle in range(0, 360, 45)]
# Initialize an empty list to store the formatted strings
formatted_strings = []
# Iterate through the directions, angle ranges, and observation strings
for direction, idx in zip(directions, obs_idx):
# Calculate the relative angles and normalize them
rel_angle1 = normalize_angle(angle_ranges[idx][0])
rel_angle2 = normalize_angle(angle_ranges[idx][1])
# Convert the angles to "left n" or "right n"
left_right1 = angle_to_left_right(rel_angle1)
left_right2 = angle_to_left_right(rel_angle2)
# Create the formatted string
formatted_string = f"{direction}, range ({left_right1} to {left_right2}): \n'{observation_list[idx]}'"
# Add the objects to the formatted string
object_dict = {}
if len(object_list[idx]) > 0:
object = object_list[idx]
for obj, obj_data in object.items():
rel_obj_heading = obj_data['heading'] - heading_angle
rel_obj_heading = normalize_angle(rel_obj_heading)
rel_obj_heading = angle_to_left_right(rel_obj_heading)
object_dict[obj] = f'{rel_obj_heading}, {obj_data["distance"]:.2f}m'
formatted_string += f'\n{direction} Objects in 3m: {object_dict}'
else:
formatted_string += f'\n{direction} Objects in 3m: None'
# Add the candidate viewpoints to the formatted string
if candidate_range.get(idx):
formatted_string += f'\n{direction} Navigable Viewpoints:{candidate_range[idx]}'
else:
formatted_string += f'\n{direction} Navigable Viewpoints: None'
# Add the formatted string to the list
formatted_strings.append(formatted_string)
# Join the formatted strings into a single output string
output_string = '\n'.join(formatted_strings)
return output_string
def init_trajecotry(self, obs: List[dict]):
"""Initialize the trajectory with the given observation."""
# Record the navigation path
self.traj = [{
'instr_id': ob['instr_id'],
'path': [[ob['viewpoint']]],
'details': [],
} for ob in obs]
# Record the history of actions taken
self.agent_executor.agent.history = [f'Navigation start, no actions taken yet.\nCurrent viewpoint "{obs[0]["viewpoint"]}": Scene from the viewpoint is a {obs[0]["obs_summary"]}']
def _create_make_action_tool(
self,
llm: BaseLanguageModel,
) -> Tool:
"""Create a tool to make single action prediction in MP3D.
The tool is invoked with the simulation environment and records the
action taken by the agent.
The tool interacts with the environment to obtain the current observation,
uses the LLM to predict the next action, and to summarize the previous trajectory
into history.
"""
action_prompt = PromptTemplate(
template=ACTION_PROMPT,
input_variables=["action_plan", "observation", "history", "navigable_viewpoints"],
)
history_prompt = PromptTemplate(
template=HISTORY_PROMPT,
input_variables=["history", "previous_action", "observation"],
)
self.action_chain = LLMChain(llm=llm, prompt=action_prompt)
self.history_chain = LLMChain(llm=llm, prompt=history_prompt)
def _make_action(*args, **kwargs) -> str:
'''Make single step action in MatterSim.'''
# Get current observation
cur_obs = self.env._get_obs()[0]
# Get current feature
feature = cur_obs['obs']
heading = np.rad2deg(cur_obs['heading'])
elevation = np.rad2deg(cur_obs['elevation'])
objects = cur_obs['objects']
orientation = f'\nheading: {heading:.2f}, elevation: {elevation:.2f}'
navigable = cur_obs['candidate']
if self.config.use_relative_angle:
feature = self.modify_heading_angles(heading, feature, navigable, objects)
if self.config.use_navigable:
navigable = self.get_navigable_str(heading, elevation, navigable)
if self.config.use_tool_chain:
# Get current action plan
action_plan = self.cur_action_plan
# Single step action
LLM_action_output = self.action_chain.run(
action_plan = action_plan,
observation = feature,
history = self.agent_executor.agent.history[-1],
navigable_viewpoints = navigable
)
# Parse LLM output, action is the next viewpoint ID
thought, action = self.parse_action(LLM_action_output)
else:
action = args[0].strip(" ").strip('"').strip("'")
# Make the action in Simulator
if action not in self.env.env.sims[0].navigable_dict.keys():
# Update history
history = f'ViewpointID "{action}" is not valid, no action taken for the agent.'
self.agent_executor.agent.history.append(history)
if self.config.use_navigable:
return f"\nViewpointID '{action}' is not valid, agent not moved. DO NOT fabricate nonexistent IDs. The navigable viewpoints you can choose from current viewpoints are: {[key for key in navigable.keys()]}.\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
else:
return f"\nViewpointID '{action}' is not valid, agent not moved. DO NOT fabricate nonexistent IDs. The navigable viewpoints you can choose from current viewpoints are: {[key for key in navigable.keys()]}.\n\tCurrent Viewpoint:\n{feature}"
else:
turned_angle, new_obs = self.make_equiv_action([action])
# Update the current feature
new_feature = new_obs['obs']
new_feature_sum = new_obs['obs_summary']
new_navigable = new_obs['candidate']
new_objects = new_obs['objects']
new_heading = np.rad2deg(new_obs['heading'])
new_elevation = np.rad2deg(new_obs['elevation'])
if self.config.use_relative_angle:
new_feature = self.modify_heading_angles(new_heading, new_feature, new_navigable, new_objects)
new_orientation = f'\nheading: {new_heading:.2f}, elevation: {new_elevation:.2f}'
if self.config.use_navigable:
new_navigable = self.get_navigable_str(new_heading, new_elevation, new_navigable)
# Update history
if self.config.use_history_chain:
history = self.history_chain.run(
observation = new_feature_sum,
history = self.agent_executor.agent.history[-1],
previous_action = turned_angle
)
else:
history = self.get_history(new_obs, turned_angle)
self.agent_executor.agent.history.append(history)
# Record single step detail
if self.config.use_tool_chain:
detail = {
"viewpointID": action,
"turned_angle": turned_angle,
"acion_maker_thought": thought,
"feature": new_feature,
"history": self.agent_executor.agent.history[-1],
}
else:
detail = {
"viewpointID": action,
"turned_angle": turned_angle,
"feature": new_feature,
"history": self.agent_executor.agent.history[-1],
}
self.traj[0]['details'].append(detail)
# Return LLM chain output as the observation of tool
if self.config.use_tool_chain:
return f"\n\tAction_maker Thought:\n{thought}\n\tAction_maker Action:\n{turned_angle}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
elif self.config.use_relative_angle:
if self.config.use_navigable:
return f"\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
else:
return f'\nCurrent Viewpoint "{action}":\n{new_feature}'
else:
if self.config.use_navigable:
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
else:
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}"
return Tool(
name=MAKE_ACTION_TOOL_NAME,
func=_make_action,
description=MAKE_ACTION_TOOL_DESCRIPTION,
)
def _create_back_trace_tool(
self,
llm: BaseLanguageModel,
) -> Tool:
"""Create a tool to back trace during navigation.
The tool is invoked with the history of navigation trajectory.
Using the LLM to find a viewpoint on the trajectory to back trace to.
"""
prompt = PromptTemplate(
template=BACK_TRACE_PROMPT,
input_variables=["action_plan", "history", "observation"],
)
chain = LLMChain(llm=llm, prompt=prompt)
def _back_trace(*args, **kwargs) -> str:
'''Back trace the action plan.'''
cur_obs = self.env._get_obs()[0]
# Get current feature
feature = cur_obs['obs']
navigable = cur_obs['candidate']
objects = cur_obs['objects']
heading = np.rad2deg(cur_obs['heading'])
elevation = np.rad2deg(cur_obs['elevation'])
orientation = f'\nheading: {heading:.2f}, elevation: {elevation:.2f}'
if self.config.use_relative_angle:
feature = self.modify_heading_angles(heading, feature, navigable, objects)
if self.config.use_navigable:
navigable = self.get_navigable_str(heading, elevation, navigable)
if self.config.use_tool_chain:
# Get current action plan
action_plan = self.cur_action_plan
# Get all previous viewpoints observation
previous_vp = self.get_his_viewpoints()
# Back trace
LLM_output = chain.run(action_plan = action_plan, observation = previous_vp, history = self.agent_executor.agent.history[-1])
# Parse LLM output, action is the next viewpoint ID
thought, action = self.parse_action(LLM_output)
else:
action = args[0].strip(" ").strip('"').strip("'")
# Make the action in Simulator
if action not in self.env.env.sims[0].navigable_dict.keys():
if self.config.use_navigable:
return f"\nViewpointID '{action}' is not valid. DO NOT fabricate nonexistent IDs.\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
else:
return f"\nViewpointID '{action}' is not valid. DO NOT fabricate nonexistent IDs.\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}"
else:
_, new_obs = self.make_equiv_action([action])
# Update the current feature
new_feature = new_obs['obs']
new_navigable = new_obs['candidate']
new_objects = new_obs['objects']
new_heading = np.rad2deg(new_obs['heading'])
new_elevation = np.rad2deg(new_obs['elevation'])
new_orientation = f'\nheading: {new_heading:.2f}, elevation: {new_elevation:.2f}'
if self.config.use_relative_angle:
new_feature = self.modify_heading_angles(new_heading, new_feature, new_navigable, new_objects)
if self.config.use_navigable:
new_navigable = self.get_navigable_str(new_heading, new_elevation, new_navigable)
# Update history
history = self.get_history(new_obs, 'Seems going in a wrong way, back trace to a previous point.')
self.agent_executor.agent.history.append(history)
# Record single step detail
if self.config.use_tool_chain:
return f"\tBack_tracer Thought:\n{thought}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
elif self.config.use_relative_angle:
if self.config.use_navigable:
return f"\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
else:
return f"\nCurrent Viewpoint:{action}\n{new_feature}"
else:
if self.config.use_navigable:
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}\n\tNavigable Viewpoints:\n{new_navigable}"
else:
return f"\n\tCurrent Orientation:\n{new_orientation}\n\tCurrent Viewpoint:\n{new_feature}"
return Tool(
name=BACK_TRACE_TOOL_NAME,
func=_back_trace,
description=BACK_TRACE_TOOL_DESCRIPTION,
)
def create_vln_agent(
self,
) -> AgentExecutor:
"""Instantiate API planner and controller for a given trajectory.
We use a top-level "orchestrator" agent to invoke the planner and controller,
rather than a top-level planner
that invokes a controller with its plan. This is to keep the planner simple.
"""
self.action_maker = self._create_make_action_tool(self.llm)
self.back_tracer = self._create_back_trace_tool(self.llm)
tools = [
self.action_maker,
self.back_tracer
]
if self.config.use_tool_chain:
prompt = PromptTemplate(
template=VLN_ORCHESTRATOR_PROMPT,
input_variables=["action_plan", "init_observation", "observation", "agent_scratchpad"],
partial_variables={
"tool_names": ", ".join([tool.name for tool in tools]),
"tool_descriptions": "\n".join(
[f"{tool.name}: {tool.description}" for tool in tools]
),
},
)
elif self.config.use_single_action:
tools = [self.action_maker]
prompt = PromptTemplate(
template=VLN_GPT4_PROMPT if self.config.llm_model_name == 'gpt-4' else VLN_GPT35_PROMPT,
input_variables=["action_plan", "init_observation", "agent_scratchpad"],
partial_variables={
"tool_names": ", ".join([tool.name for tool in tools]),
"tool_descriptions": "\n".join(
[f"{tool.name}: {tool.description}" for tool in tools]
),
},
)
else:
prompt = PromptTemplate(
template=VLN_ORCHESTRATOR_PROMPT,
input_variables=["action_plan", "init_observation", "agent_scratchpad"],
partial_variables={
"tool_names": ", ".join([tool.name for tool in tools]),
"tool_descriptions": "\n".join(
[f"{tool.name}: {tool.description}" for tool in tools]
),
},
)
agent = VLNAgent(
llm_chain=LLMChain(llm=self.llm, prompt=prompt),
allowed_tools=[tool.name for tool in tools],
output_parser = self.output_parser
)
return AgentExecutor.from_agent_and_tools(
agent=agent,
tools=tools,
verbose=True,
handle_parsing_errors = True,
return_intermediate_steps=True,
max_iterations=self.config.max_iterations,
)
def make_equiv_action(self, actions: List[str]) -> str:
"""
Interface between Panoramic view and Egocentric view
Take in the next viewpoint ID and move the agent to that viewpoint
return the turned angle and new observation
"""
def normalize_angle(angle):
while angle > 180:
angle -= 360
while angle <= -180:
angle += 360
return angle
def angle_to_left_right(angle):
return f"left {-angle:.2f}" if angle < 0 else f"right {angle:.2f}"
# Get current agent facing angle
cur_obs = self.env._get_obs()[0]
cur_heading = np.rad2deg(cur_obs['heading'])
# Make the action
new_obs = self.env.step(actions)[0]
new_heading = np.rad2deg(new_obs['heading'])
# Record the trajectory
self.traj[0]['path'].append(self.env.env.sims[0].gmap.bfs_shortest_path(cur_obs['viewpoint'], actions[0])[1:])
# Calculate the turned angle
turned_angle = new_heading - cur_heading
# Generate action description
cur_heading = angle_to_left_right(normalize_angle(cur_heading))
new_heading = angle_to_left_right(normalize_angle(new_heading))
action_description = f'Turn heading direction {turned_angle:.2f} degrees from {cur_heading} to {new_heading}.'
return action_description, new_obs
def rollout(self, reset=True):
if reset: # Reset env
obs = self.env.reset()
else:
obs = self.env._get_obs()
# Initialize the trajectory
self.init_trajecotry(obs)
# Load the instruction
instructions = [ob['instruction'] for ob in obs]
if self.config.load_instruction:
action_plans = instructions
elif self.config.load_action_plan:
action_plans = [ob['action_plan'] for ob in obs]
else:
action_plans = []
for instruction in instructions:
action_plan = self.plan_chain.run(instruction = instruction)
action_plans.append(action_plan)
for i, init_ob in enumerate(obs):
self.cur_action_plan = action_plans[i]
# Take the first action
if self.config.use_tool_chain:
first_obs = self.action_maker('')
input = {
'action_plan': self.cur_action_plan,
'init_observation': init_ob['obs_summary'],
'observation': first_obs,
}
else:
# Get current feature
feature = init_ob['obs']
navigable = init_ob['candidate']
objects = init_ob['objects']
heading = np.rad2deg(init_ob['heading'])
elevation = np.rad2deg(init_ob['elevation'])
orientation = f'\nheading: {heading:.2f}, elevation: {elevation:.2f}'
if self.config.use_relative_angle:
feature = self.modify_heading_angles(heading, feature, navigable, objects)
if self.config.use_navigable:
navigable = self.get_navigable_str(heading, elevation, navigable)
if self.config.use_relative_angle:
if self.config.use_navigable:
init_observation = f"\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
else:
init_observation = f"\n\tCurrent Viewpoint:\n{feature}"
else:
if self.config.use_navigable:
init_observation = f"\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}\n\tNavigable Viewpoints:\n{navigable}"
else:
init_observation = f"\n\tCurrent Orientation:\n{orientation}\n\tCurrent Viewpoint:\n{feature}"
input = {
'action_plan': self.cur_action_plan,
'init_observation': init_observation,
}
output = self.agent_executor(input)
self.traj[i]['llm_output'] = output['output']
self.traj[i]['action_plan'] = output['action_plan']
# extract agent's thought from llm output
intermediate_steps = output['intermediate_steps']
self.traj[i]['llm_thought'] = []
self.traj[i]['llm_observation'] = []
for action, observation in intermediate_steps:
thought = action.log
self.traj[i]['llm_thought'].append(thought)
self.traj[i]['llm_observation'].append(observation)
return self.traj

65
nav_src/agent_base.py Normal file
View File

@ -0,0 +1,65 @@
import json
import os
class BaseAgent(object):
''' Base class for an REVERIE agent to generate and save trajectories. '''
def __init__(self, env):
self.env = env
self.results = {}
def get_results(self, detailed_output=False):
output = []
for k, v in self.results.items():
output.append({'instr_id': k, 'trajectory': v['path']})
if detailed_output:
output[-1]['details'] = v['details']
output[-1]['action_plan'] = v['action_plan']
output[-1]['llm_output'] = v['llm_output']
output[-1]['llm_thought'] = v['llm_thought']
output[-1]['llm_observation'] = v['llm_observation']
return output
def rollout(self, **args):
''' Return a list of dicts containing instr_id:'xx', path:[(viewpointId, heading_rad, elevation_rad)] '''
raise NotImplementedError
@staticmethod
def get_agent(name):
return globals()[name+"Agent"]
def test(self, iters=None, **kwargs):
# self.env.reset_epoch(shuffle=(iters is not None)) # If iters is not none, shuffle the env batch
self.losses = []
self.results = {}
# We rely on env showing the entire batch before repeating anything
looped = False
self.loss = 0
if iters is not None:
# For each time, it will run the first 'iters' iterations. (It was shuffled before)
for i in range(iters):
for traj in self.rollout(**kwargs):
self.loss = 0
self.results[traj['instr_id']] = traj
preds_detail = self.get_results(detailed_output=True)
json.dump(
preds_detail,
open(os.path.join(self.config.log_dir, 'runtime.json'), 'w'),
sort_keys=True, indent=4, separators=(',', ': ')
)
else: # Do a full round
while True:
for traj in self.rollout(**kwargs):
if traj['instr_id'] in self.results:
looped = True
else:
self.loss = 0
self.results[traj['instr_id']] = traj
preds_detail = self.get_results(detailed_output=True)
json.dump(
preds_detail,
open(os.path.join(self.config.log_dir, 'runtime.json'), 'w'),
sort_keys=True, indent=4, separators=(',', ': ')
)
if looped:
break

30
nav_src/data_utils.py Normal file
View File

@ -0,0 +1,30 @@
import os
import json
import numpy as np
def load_instr_datasets(anno_dir, dataset, splits):
data = []
for split in splits:
filepath = os.path.join(anno_dir, f'{split}.json')
with open(filepath) as f:
new_data = json.load(f)
data += new_data
return data
def construct_instrs(anno_dir, dataset, splits):
data = []
if "instr" in splits[0]:
return load_instr_datasets(anno_dir, dataset, splits)
for i, item in enumerate(load_instr_datasets(anno_dir, dataset, splits)):
# Split multiple instructions into separate entries
for j, instr in enumerate(item['instructions']):
new_item = dict(item)
new_item['instr_id'] = '%s_%d' % (item['path_id'], j)
new_item['instruction'] = instr
del new_item['instructions']
del new_item['instr_encodings']
data.append(new_item)
return data

323
nav_src/env.py Normal file
View File

@ -0,0 +1,323 @@
''' Batched REVERIE navigation environment '''
import json
import os
import numpy as np
import random
import networkx as nx
from collections import defaultdict
from utils.data import load_nav_graphs
from eval_utils import cal_dtw, cal_cls
from utils.graph_utils import NavGraph
ERROR_MARGIN = 3.0
class Simulator(object):
''' A simple simulator in Matterport3D environment '''
def __init__(
self,
navigable_dir: str,):
self.heading = 0
self.elevation = 0
self.scan_ID = ''
self.viewpoint_ID = ''
self.navigable_dir = navigable_dir
self.navigable_dict = {}
self.candidate = {}
self.gmap = NavGraph()
def newEpisode(
self,
scan_ID: str,
viewpoint_ID: str,
heading: int,
elevation: int,):
self.heading = heading
self.elevation = elevation
self.scan_ID = scan_ID
self.viewpoint_ID = viewpoint_ID
# Load navigable dict
navigable_path = os.path.join(self.navigable_dir, self.scan_ID + '_navigable.json')
with open(navigable_path, 'r') as f:
self.navigable_dict = json.load(f)
# Get candidate
self.getCandidate()
def updateGraph(self):
# build graph
for candidate in self.candidate.keys():
self.gmap.update_connection(self.viewpoint_ID, candidate)
def getState(self) -> dict:
self.state = {
'scanID': self.scan_ID,
'viewpointID': self.viewpoint_ID,
'heading': self.heading,
'elevation': self.elevation,
'candidate': self.candidate,
}
return self.state
def getCandidate(self):
"""
Get the agent's candidate list from pre-stored navigable dict.
"""
self.candidate = self.navigable_dict[self.viewpoint_ID]
self.updateGraph()
def makeAction(self, next_viewpoint_ID):
"""
Make action and update the agent's state.
"""
if next_viewpoint_ID == self.viewpoint_ID:
return
elif next_viewpoint_ID in self.candidate.keys():
self.heading = self.candidate[next_viewpoint_ID]['heading']
self.elevation = self.candidate[next_viewpoint_ID]['elevation']
self.viewpoint_ID = next_viewpoint_ID
self.getCandidate()
class EnvBatch(object):
''' A simple wrapper for a batch of MatterSim environments,
using discretized viewpoints and pretrained features '''
def __init__(self, navigable_dir, feat_db=None, batch_size=100):
"""
1. Load pretrained image feature
2. Init the Simulator.
:param feat_db: The name of file stored the feature.
:param batch_size: Used to create the simulator list.
"""
self.feat_db = feat_db
self.sims = []
for i in range(batch_size):
sim = Simulator(navigable_dir)
self.sims.append(sim)
def _make_id(self, scanId, viewpointId):
return scanId + '_' + viewpointId
def newEpisodes(self, scanIds, viewpointIds, headings):
for i, (scanId, viewpointId, heading) in enumerate(zip(scanIds, viewpointIds, headings)):
self.sims[i].newEpisode(scanId, viewpointId, heading, 0)
def getStates(self):
"""
Get list of states augmented with precomputed image features. rgb field will be empty.
Agent's current view [0-35] (set only when viewing angles are discretized)
[0-11] looking down, [12-23] looking at horizon, [24-35] looking up
:return: [ ((36, 2048), sim_state) ] * batch_size
"""
feature_states = []
for i, sim in enumerate(self.sims):
state = sim.getState()
feature = self.feat_db.get_image_observation(state["scanID"], state["viewpointID"])
feature_states.append((feature, state))
return feature_states
def makeActions(self, next_viewpoint_IDs):
''' Take an action using the full state dependent action interface (with batched input)'''
for i, next_viewpoint_ID in enumerate(next_viewpoint_IDs):
self.sims[i].makeAction(next_viewpoint_ID)
class R2RNavBatch(object):
''' Implements the REVERIE navigation task, using discretized viewpoints and pretrained features '''
def __init__(
self, view_db, instr_data, connectivity_dir, navigable_dir,
batch_size=1, seed=0, name=None,
):
self.env = EnvBatch(navigable_dir, feat_db=view_db, batch_size=batch_size)
self.data = instr_data
self.scans = set([x['scan'] for x in self.data])
self.connectivity_dir = connectivity_dir
self.batch_size = batch_size
self.name = name
self.gt_trajs = self._get_gt_trajs(self.data) # for evaluation
# use different seeds in different processes to shuffle data
self.seed = seed
random.seed(self.seed)
random.shuffle(self.data)
self.ix = 0
self._load_nav_graphs()
self.buffered_state_dict = {}
print('%s loaded with %d instructions, using splits: %s' % (
self.__class__.__name__, len(self.data), self.name))
def _get_gt_trajs(self, data):
gt_trajs = {
x['instr_id']: (x['scan'], x['path']) \
for x in data if len(x['path']) > 1
}
return gt_trajs
def size(self):
return len(self.data)
def _load_nav_graphs(self):
"""
load graph from self.scan,
Store the graph {scan_id: graph} in self.graphs
Store the shortest path {scan_id: {view_id_x: {view_id_y: [path]} } } in self.paths
Store the distances in self.distances. (Structure see above)
Load connectivity graph for each scan, useful for reasoning about shortest paths
:return: None
"""
print('Loading navigation graphs for %d scans' % len(self.scans))
self.graphs = load_nav_graphs(self.connectivity_dir, self.scans)
self.shortest_paths = {}
for scan, G in self.graphs.items(): # compute all shortest paths
self.shortest_paths[scan] = dict(nx.all_pairs_dijkstra_path(G))
self.shortest_distances = {}
for scan, G in self.graphs.items(): # compute all shortest paths
self.shortest_distances[scan] = dict(nx.all_pairs_dijkstra_path_length(G))
def _next_minibatch(self, batch_size=None, **kwargs):
"""
Store the minibach in 'self.batch'
"""
if batch_size is None:
batch_size = self.batch_size
batch = self.data[self.ix: self.ix+batch_size]
if len(batch) < batch_size:
random.shuffle(self.data)
self.ix = batch_size - len(batch)
batch += self.data[:self.ix]
else:
self.ix += batch_size
self.batch = batch
def reset_epoch(self, shuffle=False):
''' Reset the data index to beginning of epoch. Primarily for testing.
You must still call reset() for a new episode. '''
if shuffle:
random.shuffle(self.data)
self.ix = 0
def _get_obs(self):
obs = []
for i, (feature, state) in enumerate(self.env.getStates()):
item = self.batch[i]
ob = {
'obs' : feature["detail"],
'obs_summary' : feature["summary"],
'objects' : feature["objects"],
'instr_id' : item['instr_id'],
# 'action_plan' : item['action_plan'],
'scan' : state['scanID'],
'viewpoint' : state['viewpointID'],
'heading' : state['heading'],
'elevation' : state['elevation'],
'candidate': state['candidate'],
'instruction' : item['instruction'],
'gt_path' : item['path'],
'path_id' : item['path_id']
}
# RL reward. The negative distance between the state and the final state
# There are multiple gt end viewpoints on REVERIE.
if ob['instr_id'] in self.gt_trajs:
ob['distance'] = self.shortest_distances[ob['scan']][ob['viewpoint']][item['path'][-1]]
else:
ob['distance'] = 0
obs.append(ob)
return obs
def reset(self, **kwargs):
''' Load a new minibatch / episodes. '''
self._next_minibatch(**kwargs)
scanIds = [item['scan'] for item in self.batch]
viewpointIds = [item['path'][0] for item in self.batch]
headings = [item['heading'] for item in self.batch]
self.env.newEpisodes(scanIds, viewpointIds, headings)
return self._get_obs()
def step(self, next_viewpoint_IDs):
''' Take action (same interface as makeActions) '''
self.env.makeActions(next_viewpoint_IDs)
return self._get_obs()
############### Nav Evaluation ###############
def _get_nearest(self, shortest_distances, goal_id, path):
near_id = path[0]
near_d = shortest_distances[near_id][goal_id]
for item in path:
d = shortest_distances[item][goal_id]
if d < near_d:
near_id = item
near_d = d
return near_id
def _eval_item(self, scan, pred_path, gt_path):
scores = {}
shortest_distances = self.shortest_distances[scan]
path = sum(pred_path, [])
assert gt_path[0] == path[0], 'Result trajectories should include the start position'
nearest_position = self._get_nearest(shortest_distances, gt_path[-1], path)
scores['nav_error'] = shortest_distances[path[-1]][gt_path[-1]]
scores['oracle_error'] = shortest_distances[nearest_position][gt_path[-1]]
scores['action_steps'] = len(pred_path) - 1
scores['trajectory_steps'] = len(path) - 1
scores['trajectory_lengths'] = np.sum([shortest_distances[a][b] for a, b in zip(path[:-1], path[1:])])
gt_lengths = np.sum([shortest_distances[a][b] for a, b in zip(gt_path[:-1], gt_path[1:])])
scores['success'] = float(scores['nav_error'] < ERROR_MARGIN)
scores['spl'] = scores['success'] * gt_lengths / max(scores['trajectory_lengths'], gt_lengths, 0.01)
scores['oracle_success'] = float(scores['oracle_error'] < ERROR_MARGIN)
scores.update(
cal_dtw(shortest_distances, path, gt_path, scores['success'], ERROR_MARGIN)
)
scores['CLS'] = cal_cls(shortest_distances, path, gt_path, ERROR_MARGIN)
return scores
def eval_metrics(self, preds):
''' Evaluate each agent trajectory based on how close it got to the goal location
the path contains [view_id, angle, vofv]'''
print('eval %d predictions' % (len(preds)))
metrics = defaultdict(list)
for item in preds:
instr_id = item['instr_id']
traj = item['trajectory']
scan, gt_traj = self.gt_trajs[instr_id]
traj_scores = self._eval_item(scan, traj, gt_traj)
for k, v in traj_scores.items():
metrics[k].append(v)
metrics['instr_id'].append(instr_id)
avg_metrics = {
'action_steps': np.mean(metrics['action_steps']),
'steps': np.mean(metrics['trajectory_steps']),
'lengths': np.mean(metrics['trajectory_lengths']),
'nav_error': np.mean(metrics['nav_error']),
'oracle_error': np.mean(metrics['oracle_error']),
'sr': np.mean(metrics['success']) * 100,
'oracle_sr': np.mean(metrics['oracle_success']) * 100,
'spl': np.mean(metrics['spl']) * 100,
'nDTW': np.mean(metrics['nDTW']) * 100,
'SDTW': np.mean(metrics['SDTW']) * 100,
'CLS': np.mean(metrics['CLS']) * 100,
}
return avg_metrics, metrics

43
nav_src/eval_utils.py Normal file
View File

@ -0,0 +1,43 @@
''' Utils for evaluation '''
import numpy as np
def cal_dtw(shortest_distances, prediction, reference, success=None, threshold=3.0):
dtw_matrix = np.inf * np.ones((len(prediction) + 1, len(reference) + 1))
dtw_matrix[0][0] = 0
for i in range(1, len(prediction)+1):
for j in range(1, len(reference)+1):
best_previous_cost = min(
dtw_matrix[i-1][j], dtw_matrix[i][j-1], dtw_matrix[i-1][j-1])
cost = shortest_distances[prediction[i-1]][reference[j-1]]
dtw_matrix[i][j] = cost + best_previous_cost
dtw = dtw_matrix[len(prediction)][len(reference)]
ndtw = np.exp(-dtw/(threshold * len(reference)))
if success is None:
success = float(shortest_distances[prediction[-1]][reference[-1]] < threshold)
sdtw = success * ndtw
return {
'DTW': dtw,
'nDTW': ndtw,
'SDTW': sdtw
}
def cal_cls(shortest_distances, prediction, reference, threshold=3.0):
def length(nodes):
return np.sum([
shortest_distances[a][b]
for a, b in zip(nodes[:-1], nodes[1:])
])
coverage = np.mean([
np.exp(-np.min([ # pylint: disable=g-complex-comprehension
shortest_distances[u][v] for v in prediction
]) / threshold) for u in reference
])
expected = coverage * length(reference)
score = expected / (expected + np.abs(expected - length(prediction)))
return coverage * score

79
nav_src/parser.py Normal file
View File

@ -0,0 +1,79 @@
import argparse
import os
def parse_args():
parser = argparse.ArgumentParser(description="")
# datasets
parser.add_argument('--root_dir', type=str, default='../datasets')
parser.add_argument('--dataset', type=str, default='r2r', choices=['r2r', 'r4r'])
parser.add_argument('--output_dir', type=str, default='../datasets/R2R/exprs/gpt-3.5-turbo', help='experiment id')
# parser.add_argument('--output_dir', type=str, default='../datasets/R2R/exprs/LlaMA-2-13b-test', help='experiment id')
parser.add_argument('--seed', type=int, default=0)
# Agent
parser.add_argument('--temperature', type=float, default=0.0, help='temperature for llm')
parser.add_argument('--llm_model_name', type=str, default='gpt-3.5-turbo', help='llm model name')
# parser.add_argument('--llm_model_name', type=str, default='gpt-4', help='llm model name')
# parser.add_argument('--llm_model_name', type=str, default='LlaMA-2-13b', help='llm model name')
parser.add_argument('--batch_size', type=int, default=1)
parser.add_argument('--max_iterations', type=int, default=10)
# General config
parser.add_argument('--iters', type=int, default=10, help='number of iterations to run')
# parser.add_argument('--iters', type=int, default=None, help='number of iterations to run')
parser.add_argument('--max_scratchpad_length', type=int, default=1000, help='max number of steps in an episode')
parser.add_argument('--test', action='store_true', default=False)
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_0')
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_1')
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_2')
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_3')
# parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr_4')
parser.add_argument('--val_env_name', type=str, default='R2R_val_unseen_instr')
parser.add_argument('--load_instruction', action='store_true', default=True)
parser.add_argument('--load_action_plan', action='store_true', default=True)
parser.add_argument('--use_relative_angle', action='store_true', default=True)
parser.add_argument('--use_history_chain', action='store_true', default=False)
parser.add_argument('--use_tool_chain', action='store_true', default=False)
parser.add_argument('--use_navigable', action='store_true', default=False)
parser.add_argument('--use_single_action', action='store_true', default=True)
parser.add_argument('--detailed_output', action='store_true', default=True)
# parser.add_argument('--valid_file', type=str, default='../datasets/R2R/exprs/4-R2R_val_unseen_instr/4-R2R_val_unseen_instr.json', help='valid file name')
parser.add_argument('--valid_file', type=str, default=None, help='valid file name')
args, _ = parser.parse_known_args()
args = postprocess_args(args)
return args
def postprocess_args(args):
ROOTDIR = args.root_dir
# Setup input paths
args.obs_dir = os.path.join(ROOTDIR, 'R2R', 'observations_list_summarized')
args.obs_summary_dir = os.path.join(ROOTDIR, 'R2R', 'observations_summarized')
args.obj_dir = os.path.join(ROOTDIR, 'R2R', 'objects_list')
args.connectivity_dir = os.path.join(ROOTDIR, 'R2R', 'connectivity')
args.scan_data_dir = os.path.join(ROOTDIR, 'Matterport3D', 'v1_unzip_scans')
args.anno_dir = os.path.join(ROOTDIR, 'R2R', 'annotations')
args.navigable_dir = os.path.join(ROOTDIR, 'R2R', 'navigable')
# Build paths
args.log_dir = os.path.join(args.output_dir, 'logs')
args.pred_dir = os.path.join(args.output_dir, 'preds')
os.makedirs(args.output_dir, exist_ok=True)
os.makedirs(args.log_dir, exist_ok=True)
os.makedirs(args.pred_dir, exist_ok=True)
return args

View File

View File

@ -0,0 +1,280 @@
# flake8: noqa
from langchain.prompts.prompt import PromptTemplate
PLANNER_PROMPT = """Given the long instruction: {instruction}
Divide the long instruction into action steps with detailed descriptions in the following format:
Action plan:
1. action_step_1
2. action_step_2
...
Action plan:"""
ACTION_PROMPT = """You are an agent following an action plan to navigation in indoor environment.
Action plan: {action_plan}
You are currently at one of the steps in the plan. You will be given the history of previous steps you have taken, the current observation of the environment, and the navigable viewpoints for the next step.
You should:
1) evaluate the history and observation to decide which step of action plan you are at.
2) choose one viewpoint from the navigable viewpoints.
Each navigable viewpoint has a unique ID, you should only answer the ID in the Final Answer.
----
Starting below, you should strictly follow this format:
History: the history of previous steps you have taken
Observation: the current observation of the environment
Navigable viewpoints: the navigable viewpoints for the next step
Thought: your thought on the next step
Final Answer: 'viepointID'
----
Begin!
History: {history}
Observation: {observation}
Navigable viewpoints: {navigable_viewpoints}
Thought:"""
HISTORY_PROMPT = """You are an agent navigating in indoor environment.
You have reached a new viewpoint after taking previous action. You will be given the navigation history, the current observation of the environment, and the previous action you taken.
You should:
1) evaluate the new observation and history.
2) update the history with the previous action and the new observation.
History: {history}
Previous action: {previous_action}
Observation: {observation}
Update history with the new observation:"""
MAKE_ACTION_TOOL_NAME = "action_maker"
MAKE_ACTION_TOOL_DESCRIPTION = f'Can be used to move to next adjacent viewpoint.\nThe input to this tool should be a viewpoint ID string of the next viewpoint you wish to visit. For example:\nAction: action_maker\nAction Input: "4a153b13a3f6424784cb8e5dabbb3a2c".'
BACK_TRACE_PROMPT = """You are an agent following an action plan to navigation in indoor environment.
You are currently at an intermediate step of the trajectory but seems going off the track. You will be given the action plan describing the whole trajectory, the history of previous steps you have taken, the observations of the viewpoints along the trajectory.
You should evaluate the history, the action plan and the observations along the way to decide the viewpoints to go back to.
Each navigable viewpoint has a unique ID, you should only answer the ID in the Final Answer.
You must choose one from the navigable viewpoints, DO NOT answer None of the above.
----
Starting below, you should follow this format:
Action plan: the action plan describing the whole trajectory
History: the history of previous steps you have taken
Observation: the observations of each viewpoint along the trajectory
Thought: your thought about the next step
Final Answer: 'viewpointID'
----
Begin!
Action plan: {action_plan}
History: {history}
Observation: {observation}
Thought:"""
BACK_TRACE_TOOL_NAME = "back_tracer"
BACK_TRACE_TOOL_DESCRIPTION = f"Can be used to move to any previous viewpoint on the trajectory even if the viewpoint is not adjacent.\nCan be call like {BACK_TRACE_TOOL_NAME}('viewpointID'), where 'viewpointID' is the ID of any previous viewpoint.\nThe input to this tool should be a string of viewpoint ID ONLY."
VLN_ORCHESTRATOR_TOOL_PROMPT = """You are an agent that follows an instruction to navigate in indoor environment. You are required to make sequential decisions according to the observation of the environment to follow the given instruction.
At the beginning of the navigation, you will be given the instruction describing the whole trajectory.
During navigation, you will receive the history of previous steps you have taken, the current observation of the environment at each step.
To navigate in unseen environment is hard, it is possible to go off the track as the description of the instruction.
You should act as a high level controlor, at each step, you should consider whether you are on the right track or not.
If yes, use the action_maker tool to continue.
If not, use the back_tracer tool to move to previous viewpoint on the trajectory.
Here are the descriptions of these tools: {tool_descriptions}
----
Starting below, you should follow this format:
Instruction: the instruction describing the whole trajectory
Initial Observation: the initial observation of the environment
Thought: I should start navigation according to the instruction
Action: action_maker
Action Input: ""
Observation: the result of the action
Thought: you should always think about what to do next
Action: the action to take, should be one of the tools [{tool_names}]
Action Input: ""
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I am finished executing the instruction.
Final Answer: Finished!
Begin!
Instruction: {action_plan}
Initial Observation: {init_observation}
Thought: I should start navigation according to the instruction
Action: action_maker
Action Input: ""
Observation: {observation}
Thought:{agent_scratchpad}"""
VLN_ORCHESTRATOR_ABS_PROMPT = """You are an agent that follows an instruction to navigate in indoor environment. You are required to make sequential decisions according to the observation of the environment to follow the given instruction.
At the beginning of the navigation, you will be given the instruction describing the whole trajectory.
During navigation, you will receive the history of previous steps you have taken, your current orientation, the current observation of the environment at each step, and the navigable viewpoints' orientations from current viewpoint.
All orientation are normalized in world cooridinate in degrees, you should always consider the relative angle between the observation and navigable viewpoints. i.e. relative angle 0 and 360 are the front, 90 and -270 are the right, 180 and -180 are the back, 270 and -90 are the left.
To navigate in unseen environment is hard, it is possible to go off the track as the description of the instruction. You are allow to back trace but you are encouraged to explore the environment as much as possible. The ultimate goal is to reach the destination in the instruction.
At each step, you should consider:
(1) According to Current Viewpoint observation and History, have you reached the destination?
If yes you should stop, output the 'Final Answer: Finished!' to stop.
If no you should continue:
(2) Consider whether you are on the right track or not.
If yes, use the action_maker tool to move to adjacent viewpoint shown in Navigable Viewpoints.
If not, use the back_tracer tool to move to any previous viewpoint shown in History.
You should always use the action_maker at the begining of navigation. If you are told to wait in the instruction you should output 'Final Answer: Finished!' to stop.
Here are the descriptions of these tools: {tool_descriptions}
The viewpoint ID is a string of 12 characters, for example '4a153b13a3f6424784cb8e5dabbb3a2c'. You are very strict to the viewpoint ID and will never fabricate nonexistent IDs.
----
Starting below, you should follow this format:
Instruction: the instruction describing the whole trajectory
Initial Observation: the initial observation of the environment
Thought: you should always think about what to do next
Action: the action to take, must be one of the tools [{tool_names}]
Action Input: "Viewpoint ID"
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I have reached the destination, I can stop.
Final Answer: Finished!
----
Begin!
Instruction: {action_plan}
Initial Observation: {init_observation}
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""
VLN_ORCHESTRATOR_PROMPT = """You are an agent that follows an instruction to navigate in indoor environment. You are required to make sequential decisions according to the observation of the environment to follow the given instruction.
At the beginning of the navigation, you will be given the instruction describing the whole trajectory.
During navigation, you will receive the history of previous steps you have taken, the current observation of the environment, and the navigable viewpoints' orientations from current viewpoint.
All orientation are in degrees from -180 to 180, i.e. angle 0 is the front, right 90 is 90 degree at the right, right 180 and left 180 are the back, left 90 is 90 degree at the left.
To navigate in unseen environment is hard, it is possible to go off the track as the description of the instruction. You are allow to back trace but you are encouraged to explore the environment as much as possible. The ultimate goal is to reach the destination in the instruction.
At each step, you should consider:
(1) According to Current Viewpoint observation and History, have you reached the destination?
If yes you should stop, output the 'Final Answer: Finished!' to stop.
If no you should continue:
(2) Consider whether you are on the right track or not.
If yes, use the action_maker tool to move to adjacent viewpoint shown in Navigable Viewpoints.
If not, use the back_tracer tool to move to any previous viewpoint shown in History.
You should always use the action_maker at the begining of navigation. Show your reasoning in the Thought section.
Here are the descriptions of these tools: {tool_descriptions}
The viewpoint ID is a string of 12 characters, for example '4a153b13a3f6424784cb8e5dabbb3a2c'. You are very strict to the viewpoint ID and will never fabricate nonexistent IDs.
----
Starting below, you should follow this format:
Instruction: the instruction describing the whole trajectory
Initial Observation: the initial observation of the environment
Thought: you should always think about what to do next and why
Action: the action to take, must be one of the tools [{tool_names}]
Action Input: "Viewpoint ID"
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I have reached the destination, I can stop.
Final Answer: Finished!
----
Begin!
Instruction: {action_plan}
Initial Observation: {init_observation}
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""
VLN_GPT4_PROMPT = """You are an intelligent embodied agent that follows an instruction to navigate in an indoor environment. Your task is to move among the static viewpoints (positions) of a pre-defined graph of the environment, and try to reach the target viewpoint as described by the given instruction with the least steps.
At the beginning of the navigation, you will be given an instruction of a trajectory which describes all observations and the action you should take at each step.
During navigation, at each step, you will be at a specific viewpoint and receive the history of previous steps you have taken (containing your "Thought", "Action", "Action Input" and "Observation" after the "Begin!" sign) and the observation of current viewpoint (including scene descriptions, objects, and navigable directions/distances within 3 meters).
Orientations range from -180 to 180 degrees: "0" signifies forward, "right 90" rightward, "right (or left) 180" backward, and "left 90" leftward.
You make actions by selecting navigable viewpoints to reach the destination. You are encouraged to explore the environment while avoiding revisiting viewpoints by comparing current navigable and previously visited IDs in previous "Action Input". The ultimate goal is to stop within 3 meters of the destination in the instruction. If destination visible but the target object is not detected within 3 meters, move closer.
At each step, you should consider:
(1) According to Current Viewpoint observation and History, have you reached the destination?
If yes you should stop, output the 'Final Answer: Finished!' to stop.
If not you should continue:
(2) Consider where you are on the trajectory and what should be the next viewpoint to navigate according to the instruction.
use the action_maker tool, input the next navigable viewpoint ID to move to that location.
Show your reasoning in the Thought section.
Here are the descriptions of these tools:
{tool_descriptions}
Every viewpoint has a unique viewpoint ID. You are very strict to the viewpoint ID and will never fabricate nonexistent IDs.
----
Starting below, you should follow this format:
Instruction: an instruction of a trajectory which describes all observations and the actions should be taken
Initial Observation: the initial observation of the environment
Thought: you should always think about what to do next and why
Action: the action to take, must be one of the tools [{tool_names}]
Action Input: "Viewpoint ID"
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I have reached the destination, I can stop.
Final Answer: Finished!
----
Begin!
Instruction: {action_plan}
Initial Observation: {init_observation}
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""
VLN_GPT35_PROMPT = """As an intelligent embodied agent, you will navigate an indoor environment to reach a target viewpoint based on a given instruction, performing the Vision and Language Navigation (VLN) task. You'll move among static positions within a pre-defined graph, aiming for minimal steps.
You will receive a trajectory instruction at the start and will have access to step history (your Thought, Action, Action Input and Obeservation after the Begin! sign) and current viewpoint observation (including scene descriptions, objects, and navigable directions/distances within 3 meters) during navigation. Orientations range from -180 to 180 degrees, with 0 being forward, right 90 rightward, right/left 180 backward, and left 90 leftward.
Explore the environment while avoiding revisiting viewpoints by comparing current and previously visited IDs. Reach within 3 meters of the instructed destination, and if it's visible but no objects are detected, move closer.
At each step, determine if you've reached the destination.
If yes, stop and output 'Final Answer: Finished!'.
If not, continue by considering your location and the next viewpoint based on the instruction, using the action_maker tool.
Show your reasoning in the Thought section.
Follow the given format and use provided tools.
{tool_descriptions}
Do not fabricate nonexistent viewpoint IDs.
----
Starting below, you should follow this format:
Instruction: the instruction describing the whole trajectory
Initial Observation: the initial observation of the environment
Thought: you should always think about what to do next and why
Action: the action to take, must be one of the tools [{tool_names}]
Action Input: "Viewpoint ID"
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I have reached the destination, I can stop.
Final Answer: Finished!
----
Begin!
Instruction: {action_plan}
Initial Observation: {init_observation}
Thought: I should start navigation according to the instruction, {agent_scratchpad}"""

View File

@ -0,0 +1,37 @@
import json
from langchain.chains.llm import LLMChain
from langchain.llms.openai import OpenAI
from langchain.prompts import PromptTemplate
from prompt.planner_prompt import (
PLANNER_PROMPT,
)
from data_utils import construct_instrs
# Using OpenAI davinci-text-003
llm = OpenAI(temperature=0.0)
plan_prompt = PromptTemplate(
template=PLANNER_PROMPT,
input_variables=["instruction"],
)
plan_chain = LLMChain(llm=llm, prompt=plan_prompt)
splits = ['val_72']
anno_dir = '../datasets/R2R/annotations'
dataset = 'R2R'
data = construct_instrs(anno_dir, dataset, splits)
for i, sample in enumerate(data):
print(f"Sample {i}:")
print(sample['instruction'])
action_plan = plan_chain.run(sample['instruction'])
print(action_plan)
data[i]['action_plan'] = action_plan
with open('../datasets/R2R/annotations/R2R_val_72_action_plan.json', 'w') as f:
json.dump(data, f, indent=2)

View File

@ -0,0 +1,34 @@
import os
import glob
import json
def merge_json_files(base_dir):
merged_data = []
# Iterate through subdirectories
for subdir in os.listdir(base_dir):
subdir_path = os.path.join(base_dir, subdir)
# Check if the path is a directory
if os.path.isdir(subdir_path):
# Find all JSON files in the 'preds' subdirectory
json_files = glob.glob(os.path.join(subdir_path, "preds", "*.json"))
# Merge JSON data
for file_path in json_files:
with open(file_path, 'r') as json_file:
data = json.load(json_file)
# Merge the data from this file into the merged_data dictionary
for sample in data:
merged_data.append(sample)
# Save the merged JSON data to a file
with open(os.path.join(base_dir, f"{exp_name}.json"), "w") as output_file:
json.dump(merged_data, output_file, indent=4)
base_dir = "../datasets/R2R/exprs/"
exp_name = "4-R2R_val_unseen_instr"
path = os.path.join(base_dir, exp_name)
merge_json_files(path)

View File

@ -0,0 +1,75 @@
'''
Use LLM chain to summarize the observations
'''
import os
import json
import asyncio
import argparse
from langchain.chains.llm import LLMChain
from langchain.llms.openai import OpenAI
from langchain.prompts import PromptTemplate
async def async_generate(chain, viewpointID, ob_list):
print(f"Summarizing {viewpointID} ...")
tasks = [chain.arun(description=ob) for ob in ob_list]
resp_list = await asyncio.gather(*tasks)
print(f"Summarized {viewpointID}'s observations: {resp_list}\n")
return resp_list
async def generate_concurrently(chain, obs):
tasks = [async_generate(chain, viewpointID, ob) for viewpointID, ob in obs.items()]
results = await asyncio.gather(*tasks)
return results
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--batch_size", type=int, default=5)
parser.add_argument("--obs_dir", type=str, default="../datasets/R2R/observations_list/")
parser.add_argument("--output_dir", type=str, default="../datasets/R2R/observations_list_summarized/")
parser.add_argument("--sum_type", type=str, default="list", choices=["list", "single"])
args = parser.parse_args()
obs_dir = args.obs_dir
obs_files = os.listdir(obs_dir)
output_dir = args.output_dir
# make sure the output directory exists
os.makedirs(output_dir, exist_ok=True)
llm = OpenAI(
temperature=0.0,
model_name="gpt-3.5-turbo",
)
if args.sum_type == "single":
summarize_prompt = PromptTemplate(
template='Given the description of a viewpoint. Summarize the scene from the viewpoint in one concise sentence.\n\nDescription:\n{description}\n\nSummarization: The scene from the viewpoint is a',
input_variables=["description"],
)
elif args.sum_type == "list":
summarize_prompt = PromptTemplate(
template='Here is a single scene view from top, down and middle:\n{description}\nSummarize the scene in one sentence:',
input_variables=["description"],
)
summarize_chain = LLMChain(llm=llm, prompt=summarize_prompt)
for obs_file in obs_files:
obs_path = os.path.join(obs_dir, obs_file)
with open(obs_path) as f:
obs = json.load(f)
summary = {}
viewpointIDs = list(obs.keys())
# Get the viewpointIDs in batches
for i in range(0, len(viewpointIDs), args.batch_size):
batch = viewpointIDs[i:i+args.batch_size]
print(f"Summarizing scan {obs_file.split('.')[0]} batch [{i//args.batch_size}/{len(viewpointIDs)//args.batch_size}]")
batch_obs = {viewpointID:obs[viewpointID] for viewpointID in batch}
summarized_obs = asyncio.run(generate_concurrently(summarize_chain, batch_obs))
summarized_obs = {viewpointID: summarized_obs[i] for i, viewpointID in enumerate(batch)}
summary.update(summarized_obs)
output_path = os.path.join(output_dir, f'{obs_file}.json')
with open(output_path, 'w') as f:
json.dump(summary, f, indent=2)

View File

@ -0,0 +1,36 @@
import re
import unittest
def extract_action_and_tool_input(text):
regex = r"Action\s*\d*\s*:[\s]*(.*?)[\s]*Action\s*\d*\s*Input\s*\d*\s*:[\s]*\"?([a-fA-F0-9]{32})\"?"
action_match = re.search(regex, text, re.DOTALL)
if action_match:
action = action_match.group(1).strip()
tool_input = action_match.group(2).strip()
return action, tool_input
else:
return None, None
class TestActionAndToolInputExtraction(unittest.TestCase):
def test_extraction(self):
samples = [
("which tells me ... Action: action_maker\nAction Input: \"f237319a500640d8ac172db225a3ce9c\" (Left viewpoint ID)", "action_maker", "f237319a500640d8ac172db225a3ce9c"),
("which is to turn right ... Action: action_maker\nAction Input: \"06bd0a2d004b454b9e93ddcf08344732\"", "action_maker", "06bd0a2d004b454b9e93ddcf08344732"),
("which is to exit out ... Action: action_maker\nAction Input: \"424bcb744623413f830ece5c68319d70\"\n", "action_maker", "424bcb744623413f830ece5c68319d70")
]
for idx, (sample, expected_action, expected_tool_input) in enumerate(samples, 1):
action, tool_input = extract_action_and_tool_input(sample)
# Print statements
print(f"Testing Sample {idx} ...")
print(f"Expected Action: {expected_action}, Output: {action}")
print(f"Expected Tool Input: {expected_tool_input}, Output: {tool_input}\n")
self.assertEqual(action, expected_action)
self.assertEqual(tool_input, expected_tool_input)
if __name__ == '__main__':
unittest.main()

128
nav_src/utils/data.py Normal file
View File

@ -0,0 +1,128 @@
import os
import json
import networkx as nx
import math
import numpy as np
# class ImageFeaturesDB(object):
# def __init__(self, img_ft_file, image_feat_size):
# self.image_feat_size = image_feat_size
# self.img_ft_file = img_ft_file
# self._feature_store = {}
# def get_image_feature(self, scan, viewpoint):
# key = '%s_%s' % (scan, viewpoint)
# if key in self._feature_store:
# ft = self._feature_store[key]
# else:
# with h5py.File(self.img_ft_file, 'r') as f:
# ft = f[key][...][:, :self.image_feat_size].astype(np.float32)
# self._feature_store[key] = ft
# return ft
class ImageObservationsDB(object):
def __init__(self, img_obs_dir, img_obs_sum_dir, img_obj_dir):
self.img_obs_dir = img_obs_dir
self.img_obs_sum_dir = img_obs_sum_dir
self.img_obj_dir = img_obj_dir
self._obs_store = {}
def get_image_observation(self, scan, viewpoint):
key = '%s_%s' % (scan, viewpoint)
if key in self._obs_store:
obs = self._obs_store[key]
else:
# Load image observation
with open(os.path.join(self.img_obs_dir, f'{scan}.json'), 'r') as f:
obs = json.load(f)[viewpoint]
self._obs_store[key] = {}
self._obs_store[key]['detail'] = obs
# Load image observation summary for history
with open(os.path.join(self.img_obs_sum_dir, f'{scan}_summarized.json'), 'r') as f:
obs_sum = json.load(f)[viewpoint]
self._obs_store[key]['summary'] = obs_sum
# Load image objects
with open(os.path.join(self.img_obj_dir, f'{scan}.json'), 'r') as f:
obj = json.load(f)[viewpoint]
self._obs_store[key]['objects'] = obj
obs = self._obs_store[key]
return obs
def load_nav_graphs(connectivity_dir, scans):
''' Load connectivity graph for each scan '''
def distance(pose1, pose2):
''' Euclidean distance between two graph poses '''
return ((pose1['pose'][3]-pose2['pose'][3])**2\
+ (pose1['pose'][7]-pose2['pose'][7])**2\
+ (pose1['pose'][11]-pose2['pose'][11])**2)**0.5
graphs = {}
for scan in scans:
with open(os.path.join(connectivity_dir, '%s_connectivity.json' % scan)) as f:
G = nx.Graph()
positions = {}
data = json.load(f)
for i,item in enumerate(data):
if item['included']:
for j,conn in enumerate(item['unobstructed']):
if conn and data[j]['included']:
positions[item['image_id']] = np.array([item['pose'][3],
item['pose'][7], item['pose'][11]]);
assert data[j]['unobstructed'][i], 'Graph should be undirected'
G.add_edge(item['image_id'],data[j]['image_id'],weight=distance(item,data[j]))
nx.set_node_attributes(G, values=positions, name='position')
graphs[scan] = G
return graphs
def new_simulator(connectivity_dir, scan_data_dir=None):
import MatterSim
# Simulator image parameters
WIDTH = 640
HEIGHT = 480
VFOV = 60
sim = MatterSim.Simulator()
if scan_data_dir:
sim.setDatasetPath(scan_data_dir)
sim.setNavGraphPath(connectivity_dir)
sim.setRenderingEnabled(False)
sim.setCameraResolution(WIDTH, HEIGHT)
sim.setCameraVFOV(math.radians(VFOV))
sim.setDiscretizedViewingAngles(True)
sim.setBatchSize(1)
sim.initialize()
return sim
def angle_feature(heading, elevation, angle_feat_size):
return np.array(
[math.sin(heading), math.cos(heading), math.sin(elevation), math.cos(elevation)] * (angle_feat_size // 4),
dtype=np.float32)
def get_point_angle_feature(sim, angle_feat_size, baseViewId=0):
feature = np.empty((36, angle_feat_size), np.float32)
base_heading = (baseViewId % 12) * math.radians(30)
base_elevation = (baseViewId // 12 - 1) * math.radians(30)
for ix in range(36):
if ix == 0:
sim.newEpisode(['ZMojNkEp431'], ['2f4d90acd4024c269fb0efe49a8ac540'], [0], [math.radians(-30)])
elif ix % 12 == 0:
sim.makeAction([0], [1.0], [1.0])
else:
sim.makeAction([0], [1.0], [0])
state = sim.getState()[0]
assert state.viewIndex == ix
heading = state.heading - base_heading
elevation = state.elevation - base_elevation
feature[ix, :] = angle_feature(heading, elevation, angle_feat_size)
return feature
def get_all_point_angle_feature(sim, angle_feat_size):
return [get_point_angle_feature(sim, angle_feat_size, baseViewId) for baseViewId in range(36)]

View File

@ -0,0 +1,164 @@
"""
Distributed tools
"""
import os
from pathlib import Path
from pprint import pformat
import pickle
import torch
import torch.distributed as dist
def load_init_param(opts):
"""
Load parameters for the rendezvous distributed procedure
"""
# sync file
if opts.output_dir != "":
sync_dir = Path(opts.output_dir).resolve()
sync_dir.mkdir(parents=True, exist_ok=True)
sync_file = f"{sync_dir}/.torch_distributed_sync"
else:
raise RuntimeError("Can't find any sync dir")
# world size
if opts.world_size != -1:
world_size = opts.world_size
elif os.environ.get("WORLD_SIZE", "") != "":
world_size = int(os.environ["WORLD_SIZE"])
else:
raise RuntimeError("Can't find any world size")
# rank
if os.environ.get("RANK", "") != "":
# pytorch.distributed.launch provide this variable no matter what
rank = int(os.environ["RANK"])
else:
if opts.node_rank != -1:
node_rank = opts.node_rank
elif os.environ.get("NODE_RANK", "") != "":
node_rank = int(os.environ["NODE_RANK"])
else:
raise RuntimeError("Can't find any rank or node rank")
if opts.local_rank != -1:
local_rank = opts.local_rank
elif os.environ.get("LOCAL_RANK", "") != "":
local_rank = int(os.environ["LOCAL_RANK"])
else:
raise RuntimeError("Can't find any rank or local rank")
# WARNING: this assumes that each node has the same number of GPUs
n_gpus = torch.cuda.device_count()
rank = local_rank + node_rank * n_gpus
return {
"backend": "nccl",
"init_method": f"file://{sync_file}",
"rank": rank,
"world_size": world_size,
}
def init_distributed(opts):
init_param = load_init_param(opts)
rank = init_param["rank"]
print(f"Init distributed {init_param['rank']} - {init_param['world_size']}")
dist.init_process_group(**init_param)
return rank
def is_default_gpu(opts) -> bool:
return opts.local_rank == -1 or dist.get_rank() == 0
def is_dist_avail_and_initialized():
if not dist.is_available():
return False
if not dist.is_initialized():
return False
return True
def get_world_size():
if not is_dist_avail_and_initialized():
return 1
return dist.get_world_size()
def all_gather(data):
"""
Run all_gather on arbitrary picklable data (not necessarily tensors)
Args:
data: any picklable object
Returns:
list[data]: list of data gathered from each rank
"""
world_size = get_world_size()
if world_size == 1:
return [data]
# serialized to a Tensor
buffer = pickle.dumps(data)
storage = torch.ByteStorage.from_buffer(buffer)
tensor = torch.ByteTensor(storage).to("cuda")
# obtain Tensor size of each rank
local_size = torch.tensor([tensor.numel()], device="cuda")
size_list = [torch.tensor([0], device="cuda") for _ in range(world_size)]
dist.all_gather(size_list, local_size)
size_list = [int(size.item()) for size in size_list]
max_size = max(size_list)
# receiving Tensor from all ranks
# we pad the tensor because torch all_gather does not support
# gathering tensors of different shapes
tensor_list = []
for _ in size_list:
tensor_list.append(torch.empty((max_size,), dtype=torch.uint8, device="cuda"))
if local_size != max_size:
padding = torch.empty(size=(max_size - local_size,), dtype=torch.uint8, device="cuda")
tensor = torch.cat((tensor, padding), dim=0)
dist.all_gather(tensor_list, tensor)
data_list = []
for size, tensor in zip(size_list, tensor_list):
buffer = tensor.cpu().numpy().tobytes()[:size]
data_list.append(pickle.loads(buffer))
return data_list
def reduce_dict(input_dict, average=True):
"""
Args:
input_dict (dict): all the values will be reduced
average (bool): whether to do average or sum
Reduce the values in the dictionary from all processes so that all processes
have the averaged results. Returns a dict with the same fields as
input_dict, after reduction.
"""
world_size = get_world_size()
if world_size < 2:
return input_dict
with torch.no_grad():
names = []
values = []
# sort the keys so that they are consistent across processes
for k in sorted(input_dict.keys()):
names.append(k)
values.append(input_dict[k])
values = torch.stack(values, dim=0)
dist.all_reduce(values)
if average:
values /= world_size
reduced_dict = {k: v for k, v in zip(names, values)}
return reduced_dict
def merge_dist_results(results):
outs = []
for res in results:
outs.extend(res)
return outs

View File

@ -0,0 +1,210 @@
from collections import defaultdict, deque
import numpy as np
MAX_DIST = 30
MAX_STEP = 10
def calc_position_distance(a, b):
# a, b: (x, y, z)
dx = b[0] - a[0]
dy = b[1] - a[1]
dz = b[2] - a[2]
dist = np.sqrt(dx**2 + dy**2 + dz**2)
return dist
def calculate_vp_rel_pos_fts(a, b, base_heading=0, base_elevation=0):
# a, b: (x, y, z)
dx = b[0] - a[0]
dy = b[1] - a[1]
dz = b[2] - a[2]
xy_dist = max(np.sqrt(dx**2 + dy**2), 1e-8)
xyz_dist = max(np.sqrt(dx**2 + dy**2 + dz**2), 1e-8)
# the simulator's api is weired (x-y axis is transposed)
heading = np.arcsin(dx/xy_dist) # [-pi/2, pi/2]
if b[1] < a[1]:
heading = np.pi - heading
heading -= base_heading
elevation = np.arcsin(dz/xyz_dist) # [-pi/2, pi/2]
elevation -= base_elevation
return heading, elevation, xyz_dist
def get_angle_fts(headings, elevations, angle_feat_size):
ang_fts = [np.sin(headings), np.cos(headings), np.sin(elevations), np.cos(elevations)]
ang_fts = np.vstack(ang_fts).transpose().astype(np.float32)
num_repeats = angle_feat_size // 4
if num_repeats > 1:
ang_fts = np.concatenate([ang_fts] * num_repeats, 1)
return ang_fts
class FloydGraph(object):
def __init__(self):
self._dis = defaultdict(lambda :defaultdict(lambda: 95959595))
self._point = defaultdict(lambda :defaultdict(lambda: ""))
self._visited = set()
def distance(self, x, y):
if x == y:
return 0
else:
return self._dis[x][y]
def add_edge(self, x, y, dis):
if dis < self._dis[x][y]:
self._dis[x][y] = dis
self._dis[y][x] = dis
self._point[x][y] = ""
self._point[y][x] = ""
def update(self, k):
for x in self._dis:
for y in self._dis:
if x != y:
if self._dis[x][k] + self._dis[k][y] < self._dis[x][y]:
self._dis[x][y] = self._dis[x][k] + self._dis[k][y]
self._dis[y][x] = self._dis[x][y]
self._point[x][y] = k
self._point[y][x] = k
self._visited.add(k)
def visited(self, k):
return (k in self._visited)
def path(self, x, y):
"""
:param x: start
:param y: end
:return: the path from x to y [v1, v2, ..., v_n, y]
"""
if x == y:
return []
if self._point[x][y] == "": # Direct edge
return [y]
else:
k = self._point[x][y]
# print(x, y, k)
# for x1 in (x, k, y):
# for x2 in (x, k, y):
# print(x1, x2, "%.4f" % self._dis[x1][x2])
return self.path(x, k) + self.path(k, y)
class GraphMap(object):
def __init__(self, start_vp):
self.start_vp = start_vp # start viewpoint
self.node_positions = {} # viewpoint to position (x, y, z)
self.graph = FloydGraph() # shortest path graph
self.node_embeds = {} # {viewpoint: feature (sum feature, count)}
self.node_stop_scores = {} # {viewpoint: prob}
self.node_nav_scores = {} # {viewpoint: {t: prob}}
self.node_step_ids = {}
def update_graph(self, ob):
self.node_positions[ob['viewpoint']] = ob['position']
for cc in ob['candidate']:
self.node_positions[cc['viewpointId']] = cc['position']
dist = calc_position_distance(ob['position'], cc['position'])
self.graph.add_edge(ob['viewpoint'], cc['viewpointId'], dist)
self.graph.update(ob['viewpoint'])
def update_node_embed(self, vp, embed, rewrite=False):
if rewrite:
self.node_embeds[vp] = [embed, 1]
else:
if vp in self.node_embeds:
self.node_embeds[vp][0] += embed
self.node_embeds[vp][1] += 1
else:
self.node_embeds[vp] = [embed, 1]
def get_node_embed(self, vp):
return self.node_embeds[vp][0] / self.node_embeds[vp][1]
def get_pos_fts(self, cur_vp, gmap_vpids, cur_heading, cur_elevation, angle_feat_size=4):
# dim=7 (sin(heading), cos(heading), sin(elevation), cos(elevation),
# line_dist, shortest_dist, shortest_step)
rel_angles, rel_dists = [], []
for vp in gmap_vpids:
if vp is None:
rel_angles.append([0, 0])
rel_dists.append([0, 0, 0])
else:
rel_heading, rel_elevation, rel_dist = calculate_vp_rel_pos_fts(
self.node_positions[cur_vp], self.node_positions[vp],
base_heading=cur_heading, base_elevation=cur_elevation,
)
rel_angles.append([rel_heading, rel_elevation])
rel_dists.append(
[rel_dist / MAX_DIST, self.graph.distance(cur_vp, vp) / MAX_DIST, \
len(self.graph.path(cur_vp, vp)) / MAX_STEP]
)
rel_angles = np.array(rel_angles).astype(np.float32)
rel_dists = np.array(rel_dists).astype(np.float32)
rel_ang_fts = get_angle_fts(rel_angles[:, 0], rel_angles[:, 1], angle_feat_size)
return np.concatenate([rel_ang_fts, rel_dists], 1)
def save_to_json(self):
nodes = {}
for vp, pos in self.node_positions.items():
nodes[vp] = {
'location': pos, # (x, y, z)
'visited': self.graph.visited(vp),
}
if nodes[vp]['visited']:
nodes[vp]['stop_prob'] = self.node_stop_scores[vp]['stop']
nodes[vp]['og_objid'] = self.node_stop_scores[vp]['og']
else:
nodes[vp]['nav_prob'] = self.node_nav_scores[vp]
edges = []
for k, v in self.graph._dis.items():
for kk in v.keys():
edges.append((k, kk))
return {'nodes': nodes, 'edges': edges}
class NavGraph:
def __init__(self):
self.graph = defaultdict(list)
def add_node(self, node):
if node not in self.graph:
self.graph[node] = []
def update_connection(self, node1, node2):
self.add_node(node1)
self.add_node(node2)
if node2 in self.graph[node1]:
return None
self.graph[node1].append(node2)
self.graph[node2].append(node1)
def bfs_shortest_path(self, start, end):
if start not in self.graph or end not in self.graph:
return None
visited = {start: None}
queue = deque([start])
while queue:
current_node = queue.popleft()
if current_node == end:
path = []
while current_node is not None:
path.append(current_node)
current_node = visited[current_node]
return path[::-1]
for neighbor in self.graph[current_node]:
if neighbor not in visited:
visited[neighbor] = current_node
queue.append(neighbor)
return None

80
nav_src/utils/logger.py Normal file
View File

@ -0,0 +1,80 @@
import os
import sys
import math
import time
from collections import OrderedDict
def write_to_record_file(data, file_path, verbose=True):
if verbose:
print(data)
record_file = open(file_path, 'a')
record_file.write(data+'\n')
record_file.close()
def asMinutes(s):
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
def timeSince(since, percent):
now = time.time()
s = now - since
es = s / (percent)
rs = es - s
return '%s (- %s)' % (asMinutes(s), asMinutes(rs))
class Timer:
def __init__(self):
self.cul = OrderedDict()
self.start = {}
self.iter = 0
def reset(self):
self.cul = OrderedDict()
self.start = {}
self.iter = 0
def tic(self, key):
self.start[key] = time.time()
def toc(self, key):
delta = time.time() - self.start[key]
if key not in self.cul:
self.cul[key] = delta
else:
self.cul[key] += delta
def step(self):
self.iter += 1
def show(self):
total = sum(self.cul.values())
for key in self.cul:
print("%s, total time %0.2f, avg time %0.2f, part of %0.2f" %
(key, self.cul[key], self.cul[key]*1./self.iter, self.cul[key]*1./total))
print(total / self.iter)
def print_progress(iteration, total, prefix='', suffix='', decimals=1, bar_length=100):
"""
Call in a loop to create terminal progress bar
@params:
iteration - Required : current iteration (Int)
total - Required : total iterations (Int)
prefix - Optional : prefix string (Str)
suffix - Optional : suffix string (Str)
decimals - Optional : positive number of decimals in percent complete (Int)
bar_length - Optional : character length of bar (Int)
"""
str_format = "{0:." + str(decimals) + "f}"
percents = str_format.format(100 * (iteration / float(total)))
filled_length = int(round(bar_length * iteration / float(total)))
bar = '' * filled_length + '-' * (bar_length - filled_length)
sys.stdout.write('\r%s |%s| %s%s %s' % (prefix, bar, percents, '%', suffix)),
if iteration == total:
sys.stdout.write('\n')
sys.stdout.flush()

17
nav_src/utils/misc.py Normal file
View File

@ -0,0 +1,17 @@
import random
import numpy as np
import torch
def set_random_seed(seed):
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
random.seed(seed)
np.random.seed(seed)
def length2mask(length, size=None):
batch_size = len(length)
size = int(max(length)) if size is None else size
mask = (torch.arange(size, dtype=torch.int64).unsqueeze(0).repeat(batch_size, 1)
> (torch.LongTensor(length) - 1).unsqueeze(1)).cuda()
return mask

38
nav_src/utils/ops.py Normal file
View File

@ -0,0 +1,38 @@
import numpy as np
import torch
def pad_tensors(tensors, lens=None, pad=0):
"""B x [T, ...]"""
if lens is None:
lens = [t.size(0) for t in tensors]
max_len = max(lens)
bs = len(tensors)
hid = list(tensors[0].size()[1:])
size = [bs, max_len] + hid
dtype = tensors[0].dtype
device = tensors[0].device
output = torch.zeros(*size, dtype=dtype).to(device)
if pad:
output.data.fill_(pad)
for i, (t, l) in enumerate(zip(tensors, lens)):
output.data[i, :l, ...] = t.data
return output
def gen_seq_masks(seq_lens, max_len=None):
if max_len is None:
max_len = max(seq_lens)
if isinstance(seq_lens, torch.Tensor):
device = seq_lens.device
masks = torch.arange(max_len).to(device).repeat(len(seq_lens), 1) < seq_lens.unsqueeze(1)
return masks
if max_len == 0:
return np.zeros((len(seq_lens), 0), dtype=np.bool)
seq_lens = np.array(seq_lens)
batch_size = len(seq_lens)
masks = np.arange(max_len).reshape(-1, max_len).repeat(batch_size, 0)
masks = masks < seq_lens.reshape(-1, 1)
return masks

5
requirements.txt Normal file
View File

@ -0,0 +1,5 @@
langchain==0.0.246
numpy
openai
transformers
networkx