NTU-AI-HW5/README.md

2.1 KiB

Homework 5

Method (1) - Run in host

Install Necessary Packages

conda create -n hw5 python=3.11 -y
conda activate hw5
pip install -r requirements.txt

Training & Evaluation

Training

python pacman.py

Evaluation

python pacman.py --eval --eval_model_path submissions/pacma_dqn.pt

Method (2) - Run in Docker container

Install docker

Follow Docker docs

Install NVIDIA Container Toolkit (if you haven't install)

We should install NVIDIA Container Toolkit first, so that we can use GPU in Docker containers.

The instructions below are followed NVIDIA tutorial

  1. Configure the repository
    curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
        sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
        sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
    && \
        sudo apt-get update        
    
  2. Install toolkit
    sudo apt-get install -y nvidia-container-toolkit
    
  3. configure (docker)
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
    
  4. test (running a sample workload)
    sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
    
    if this container run nvidia-smi successfully and you can see your NVIDIA graphic card's name, it means that docker can run containers with GPU drivers

Start Docker Container

docker run -it --runtime=nvidia --gpus all snsd0805/ntu-ai-hw5

This docker image's build file is Dockerfile in this directory.

Training & Evaluation

Training

python pacman.py

Evaluation

python pacman.py --eval --eval_model_path submissions/pacma_dqn.pt