Docker & Experiments
Docker is mandatory for all experiments on AIRLab servers. It is not optional, and it is not just for production code. Running experiments inside containers ensures that every result is isolated and reproducible — by you six months from now, by your co-supervisor, and by any future lab member who builds on your work.
Why Docker?
Without containers, every researcher installs packages globally with pip
or conda. Over time, incompatible versions accumulate, experiments break,
and nobody can tell which version of a library produced which result. Docker fixes this:
- Isolation — your dependencies cannot interfere with anyone else's, and vice versa.
- Reproducibility — pin every dependency to an exact version; anyone can rebuild the same environment from scratch.
- Portability — the same container runs on Westworld, a cloud VM, or your laptop.
- Clean servers — no global package pollution;.
The standard development cycle is:
- Write a
Dockerfiledescribing your environment - Build it into a Docker image (a binary snapshot)
- Run the image as a container, mounting your code and dataset from outside
- Develop / debug by attaching to the running container
- Stop and remove the container when done; the image can be rebuilt at any time
Project Structure
Every project should follow this layout. It makes your repository self-contained — anyone who clones it can build the environment and run the code without asking you anything.
my_project/
├── src/ # your source code
│ ├── __init__.py
│ ├── train.py
│ └── evaluate.py
├── configs/ # YAML/JSON configuration files, if any
│ └── default.yaml
├── Dockerfile
├── requirements.txt # Python dependencies (pinned versions)
├── .runconfigs # Docker run options
├── .gitignore
└── README.md # setup & usage instructions Writing a Dockerfile
A Dockerfile is a script that describes how to build a Docker image. It consists of
a series of instructions (e.g., FROM, RUN, COPY)
that are executed in order.
The following is an example Dockerfile for a PyTorch project.
Start from an official NVIDIA base image that bundles CUDA and cuDNN. Pick the
version that matches the CUDA toolkit installed on the server
(run nvidia-smi and check the top-right CUDA version).
FROM nvidia/cuda:12.2.2-cudnn8-runtime-ubuntu22.04
# 1. Install dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
python3.11 \
python3-pip \
git \
curl \
&& rm -rf /var/lib/apt/lists/*
# 2. Create the "python" alias
RUN ln -s /usr/bin/python3 /usr/bin/python
WORKDIR /exp
# 3. Setup folders & permissions
RUN mkdir -p /.local /.cache /.config && \
chmod -R 777 /.local /.cache /.config
COPY requirements.txt /tmp/requirements.txt
RUN pip install --no-cache-dir -r /tmp/requirements.txt
ENV SHELL /bin/bash
CMD ["bash"] scikit-learn
torch
jupyter Building Images
# Build the image. Convention: <username>/<project>:<version>
docker build --rm -t rossi/deep_learning:v1 .
# Verify the image was built
docker images | grep my_project Running Containers
AIRLab servers provide a helper script called run-docker that wraps
docker run with sensible defaults (GPU assignment, directory mounts,
user permissions).
run-docker requires the definition of a .runconfig file in your
project root as follows:
# Example .runconfig file for run-docker
image_name: rossi/deep_learning:v1 # the image to run (must be built first)
container_name: {user}_GPU{args.gpu}_{date} # container name template with placeholders
# List of additional arguments to pass to 'docker run' every time.
# -m 16G limits the container's memory to 16 GB (adjust as needed)
# -v mounts a directory from the host into the container (host_path:container_path)
# -p forwards a port from the container to the host (host_port:container_port)
docker_args: -m 16G -v ~/storage:/data -p 8888:8888
This is how to run containers with run-docker. Remember to book
resources before running your container.
# Usage: run-docker <gpu_indices> <cpu_cores> <command...>
# Example: GPU 0, 4 CPU cores, run training
run-docker 0 4 python src/train.py --config configs/default.yaml
# Two GPUs (0 and 1), 8 cores
run-docker 0,1 8 python src/train_multi_gpu.py
# Interactive bash shell for debugging on GPU 0, 12 non-contiguous cores
run-docker 0 4-8,16-24 bash
# Interactive bash shell with no GPU
run-docker '' 4-8,16-24 bash
# Interactive bash shell with no GPU on ALL CPUs
run-docker '' '' bash If the setup is correct, you can try running the following script that prints storage and GPU information from inside the container:
import torch
import os
from datetime import datetime
print(f"Time: {datetime.now()}")
if os.path.exists('/data'):
print(f"Storage: {os.listdir('/data')}")
else:
print("!!! Storage not mounted in /data")
gpu_count = torch.cuda.device_count()
print(f"GPUs found: {gpu_count}")
Run the script with run-docker 0 4 python main.py and check the output. If you see the storage contents and the correct number of GPUs, congratulations — your container is set up correctly!
Useful Docker commands for managing containers and images:
# Useful container management commands
docker ps # list running containers
docker ps | grep lastname # filter to yours
docker exec -it CONTAINER_ID bash # attach a shell to a running container
docker logs CONTAINER_ID # view container output
docker stop CONTAINER_ID # stop a running container
docker rm CONTAINER_ID # remove a stopped container
docker rmi lastname/my_project:v1 # delete an image VS Code Integration
Install the Remote - SSH extension to edit files on the server directly from your local VS Code. To connect:
- Open the Command Palette (Ctrl+Shift+P) → Remote-SSH: Connect to Host…
- Type
lastname@<server_ip>, press Enter, and enter your password when prompted - Click Yes to add the host to your known hosts, then Add to config — future connections will require no extra setup
- Once connected, open your project folder on the server via File → Open Folder or Clone from GitHub
Remote SSH is for editing files. To run experiments, still use run-docker from
a tmux session on the server terminal or VS Code's integrated terminal.
Jupyter Notebooks
To run Jupyter inside a container:
- Install Jupyter in your Docker image (add it to
requirements.txtand rebuild the image). - Include
-p 8888:8888in your.runconfigto forward the port. If port 8888 is already in use on the server, change it to an available port (e.g., 8889).
run-docker 0 4 jupyter notebook --ip=0.0.0.0 --port=8888 --no-browser
Then connect to http://server_ip:8888 from your local browser and enter the token printed in the container logs to access your notebooks.
You can
also run .ipynb notebooks directly in the VS Code editor using the
Jupyter extension (@tag:notebookKernelJupyterNotebook).
To connect the notebook to a kernel inside your Docker container,
select the kernel picker in the top-right corner of the notebook editor and
use server_ip:port as IP address and port (e.g., ww:8888).