Docker & Experiments

Docker is mandatory for all experiments on AIRLab servers. It is not optional, and it is not just for production code. Running experiments inside containers ensures that every result is isolated and reproducible — by you six months from now, by your co-supervisor, and by any future lab member who builds on your work.

Why Docker?

Without containers, every researcher installs packages globally with pip or conda. Over time, incompatible versions accumulate, experiments break, and nobody can tell which version of a library produced which result. Docker fixes this:

  • Isolation — your dependencies cannot interfere with anyone else's, and vice versa.
  • Reproducibility — pin every dependency to an exact version; anyone can rebuild the same environment from scratch.
  • Portability — the same container runs on Westworld, a cloud VM, or your laptop.
  • Clean servers — no global package pollution;.

The standard development cycle is:

  1. Write a Dockerfile describing your environment
  2. Build it into a Docker image (a binary snapshot)
  3. Run the image as a container, mounting your code and dataset from outside
  4. Develop / debug by attaching to the running container
  5. Stop and remove the container when done; the image can be rebuilt at any time

Project Structure

Every project should follow this layout. It makes your repository self-contained — anyone who clones it can build the environment and run the code without asking you anything.

my_project/
text
my_project/
├── src/                   # your source code
│   ├── __init__.py
│   ├── train.py
│   └── evaluate.py
├── configs/               # YAML/JSON configuration files, if any
│   └── default.yaml
├── Dockerfile
├── requirements.txt       # Python dependencies (pinned versions)
├── .runconfigs            # Docker run options
├── .gitignore
└── README.md              # setup & usage instructions

Writing a Dockerfile

A Dockerfile is a script that describes how to build a Docker image. It consists of a series of instructions (e.g., FROM, RUN, COPY) that are executed in order.

The following is an example Dockerfile for a PyTorch project. Start from an official NVIDIA base image that bundles CUDA and cuDNN. Pick the version that matches the CUDA toolkit installed on the server (run nvidia-smi and check the top-right CUDA version).

Dockerfile
dockerfile
FROM nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04

# 1. Install dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
      python3.11 \
      python3-pip \
      git \
      curl \
    && rm -rf /var/lib/apt/lists/*

# 2. Create the "python" alias
RUN ln -s /usr/bin/python3 /usr/bin/python

WORKDIR /exp

# 3. Setup folders & permissions
RUN mkdir -p /.local /.cache /.config && \
    chmod -R 777 /.local /.cache /.config

COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt

ENV SHELL /bin/bash
CMD ["bash"]
requirements.txt
text
scikit-learn
torch
jupyter

Building Images

westworld — project root
bash
# Build the image. Convention: <username>/<project>:<version>
docker build --rm -t rossi/deep_learning:v1 .

# Verify the image was built
docker images | grep my_project

Running Containers

AIRLab servers provide a helper script called run-docker that wraps docker run with sensible defaults (GPU assignment, directory mounts, user permissions). run-docker requires the definition of a .runconfig file in your project root as follows:

.runconfig
bash
# Example .runconfig file for run-docker
image_name: rossi/deep_learning:v1  # the image to run (must be built first)
container_name: {user}_GPU{args.gpu}_{date}  # container name template with placeholders

# List of additional arguments to pass to 'docker run' every time.
# -m 16G limits the container's memory to 16 GB (adjust as needed)
# -v mounts a directory from the host into the container (host_path:container_path)
# -p forwards a port from the container to the host (host_port:container_port)
docker_args: -m 16G -v ~/storage:/data -p 8888:8888

This is how to run containers with run-docker. Remember to book resources before running your container.

westworld
bash
# Usage: run-docker <gpu_indices> <cpu_cores> <command...>
# Example: GPU 0, 4 CPU cores, run training
run-docker 0 4 python src/train.py --config configs/default.yaml

# Two GPUs (0 and 1), 8 cores
run-docker 0,1 8 python src/train_multi_gpu.py

# Interactive bash shell for debugging on GPU 0, 12 non-contiguous cores
run-docker 0 4-8,16-24 bash

# Interactive bash shell with no GPU
run-docker '' 4-8,16-24 bash

# Interactive bash shell with no GPU on ALL CPUs
run-docker '' '' bash

If the setup is correct, you can try running the following script that prints storage and GPU information from inside the container:

main.py
python
import torch
import os
from datetime import datetime

print(f"Time: {datetime.now()}")
if os.path.exists('/data'):
    print(f"Storage: {os.listdir('/data')}")
else:
    print("!!! Storage not mounted in /data")

gpu_count = torch.cuda.device_count()
print(f"GPUs found: {gpu_count}")

Run the script with run-docker 0 4 python main.py and check the output. If you see the storage contents and the correct number of GPUs, congratulations — your container is set up correctly!

Useful Docker commands for managing containers and images:

westworld
bash
# Useful container management commands
docker ps                           # list running containers
docker ps | grep lastname           # filter to yours
docker exec -it CONTAINER_ID bash   # attach a shell to a running container
docker logs CONTAINER_ID            # view container output
docker stop CONTAINER_ID            # stop a running container
docker rm   CONTAINER_ID            # remove a stopped container
docker rmi  lastname/my_project:v1  # delete an image

VS Code Integration

Install the Remote - SSH extension to edit files on the server directly from your local VS Code. To connect:

  1. Open the Command Palette (Ctrl+Shift+P) → Remote-SSH: Connect to Host…
  2. Type lastname@<server_ip>, press Enter, and enter your password when prompted
  3. Click Yes to add the host to your known hosts, then Add to config — future connections will require no extra setup
  4. Once connected, open your project folder on the server via File → Open Folder or Clone from GitHub

Remote SSH is for editing files. To run experiments, still use run-docker from a tmux session on the server terminal or VS Code's integrated terminal.

Jupyter Notebooks

To run Jupyter inside a container:

  1. Install Jupyter in your Docker image (add it to requirements.txt and rebuild the image).
  2. Include -p 8888:8888 in your .runconfig to forward the port. If port 8888 is already in use on the server, change it to an available port (e.g., 8889).
westworld
bash
run-docker 0 4 jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser

Then connect to http://server_ip:8888 from your local browser and enter the token printed in the container logs to access your notebooks.

You can also run .ipynb notebooks directly in the VS Code editor using the Jupyter extension (@tag:notebookKernelJupyterNotebook). To connect the notebook to a kernel inside your Docker container, select the kernel picker in the top-right corner of the notebook editor and use server_ip:port as IP address and port (e.g., ww:8888).