Skip to content

Getting Started Linux

andrewscouten edited this page Mar 15, 2026 · 2 revisions

Getting Started — Linux


Prerequisites

  • Ubuntu 22.04 or later (other distros may work but are untested)
  • NVIDIA GPU, AMD GPU, or CPU-only fallback
  • Docker Engine (recommended)

Installation

1. Install GPU drivers (AMD only)

ROCm must be installed on the host regardless of whether you use Docker or run directly. The Docker containers mount ROCm libraries from the host at runtime.

Follow the official guide: ROCm for Linux — AMD Radeon/Ryzen install guide

NVIDIA users should install drivers via their distribution's package manager or the NVIDIA CUDA toolkit.

2. Clone the repository

git clone --recurse-submodules https://github.com/collaborativebioinformatics/OncoLearn.git OncoLearn
cd OncoLearn

If you forgot --recurse-submodules, run git submodule update --init afterwards.


Docker (recommended)

Docker is the recommended way to run OncoLearn. It handles all GPU library paths automatically and avoids manual dependency management.

1. Get the images

You need two images: the dev image (for data download) and the prod image (for training). Either build locally or pull from GHCR — see the Docker guide for both options. To build locally, pick the commands matching your GPU:

# NVIDIA
docker compose --profile dev-cuda build dev-cuda
docker compose --profile prod-cuda build prod-cuda

# CPU only
docker compose --profile dev-cpu build dev-cpu
docker compose --profile prod-cpu build prod-cpu

# AMD (native Linux)
docker compose --profile dev-rocm build dev-rocm
docker compose --profile prod-rocm build prod-rocm

2. Download data

Data downloads require the dev image (the prod image is training-only). See the CLI guide for all available options. Replace <dev-profile> and <dev-service> with the values for your GPU from the table in step 1.

# NVIDIA example
docker compose --profile dev-cuda run --rm dev-cuda \
  oncolearn xena download --cohorts BRCA --category mirna_seq --unzip --output ./data/xenabrowser

docker compose --profile dev-cuda run --rm dev-cuda \
  oncolearn xena download --cohorts BRCA --category clinical --unzip --output ./data/xenabrowser

3. Run training

# NVIDIA — tabular only
docker compose --profile prod-cuda run --rm prod-cuda \
  python -m oncolearn.trainer --variant v2_no_imaging

# NVIDIA — full multimodal
docker compose --profile prod-cuda run --rm prod-cuda \
  python -m oncolearn.trainer --variant v1_imaging

# CPU only — tabular only
docker compose --profile prod-cpu run --rm prod-cpu \
  python -m oncolearn.trainer --variant v2_no_imaging

# AMD — tabular only
docker compose --profile prod-rocm run --rm prod-rocm \
  python -m oncolearn.trainer --variant v2_no_imaging

See the Docker guide for all services, volume mounts, and further details.


Without Docker (manual setup)

If you prefer to run outside of Docker, install Python dependencies with uv (GPU drivers are already covered in Installation above).

1. Install with uv

# CPU-only (data download and EDA)
uv sync --extra cpu --extra multimodal

# CUDA 12.8
uv sync --extra cu128 --extra multimodal

# CUDA 13.0
uv sync --extra cu130 --extra multimodal

# ROCm (AMD, native Linux)
uv sync --extra rocm --extra multimodal

2. Run training

# Tabular only
python -m oncolearn.trainer --variant v2_no_imaging

# Full multimodal
python -m oncolearn.trainer --variant v1_imaging

Notes

  • Issues: If something doesn't work, please open a GitHub issue with your OS, GPU model, and the full error output.

Clone this wiki locally