This document provides instructions on how to set up, train, and benchmark models on nbody simulations in self-feed mode.
The easiest and most convenient way to install the project is to use Docker.
docker build -t nbody-cuda .docker run --gpus all -it -v $(pwd):/n_body_approx nbody-cudaTo train, run the training script with the desired parameters. Here is an example command:
python -m train --config config.yaml --trainer_type trainer_nbody --model_type ponita --dataloader_type ponita_nbody --trainer.learning_rate 1.8955963499765176 --model.hidden_features 128 --model.num_layers 6 --trainer.steps_per_epoch 1000 --trainer.test_macros_every 10(Flags after --config overwrite config.yaml defaults)
Alternatively, feel free to make use of the provided config.yaml file.
See utils/config_models.py for other parameters and how to change them.
Inference and macro plotting with its evaluation is automatic. You can tweak the frequency of this using the --trainer.test_macros_every flag.
If needed, you can also run this on demand:
python -m self_feed --config runs/ponita/2025-08-27_13-09-57/config.yaml --trainer.model_path runs/ponita/2025-08-27_13-09-57/model_best_valid_loss.pth
(self-feed)
python -m helper_scripts.visualize --folder runs/af3/2025-08-29_08-21-34/checkpoints/14/trajectories_data --sim-index=0-10
(macro plotting and evaluation)
(full API reference: https://cloud.lambdalabs.com/api/v1/docs)
-
log in to your Lambda Labs account
-
create an API key in the Lambda Labs console
-
set your lambda API key as an environment variable for convenience
export LAMBDA_API_KEY=YOUR-API-KEY- generate a local ssh key if you don't have one
You can also use the setup_lambda_full.sh script to automate launching an
instance, syncing a dataset, and preparing the Docker environment. The script
accepts various command line options to override defaults such as GPU type and
dataset name. The Dockerfile is selected automatically based on the GPU type
(e.g. Dockerfile_gh200 for GH200 instances). When the dataset name is
provided, any directories whose names start with that dataset name (for example
DATASET_extra) will also be copied to the remote instance. Large *.pt files
inside those directories are skipped to reduce bandwidth:
./setup_lambda_full.sh -t gpu_1x_a10 -d my_dataset_nameRun ./setup_lambda_full.sh -h for the full list of available parameters.
ssh-keygen -t ed25519 -C "[email protected]"- add your existing key to lambda
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/ssh-keys -d '{
"name": "my-key",
"public_key": "$(cat ~/.ssh/id_ed25519.pub)"
}' -H "Content-Type: application/json"- check available instance types and pricing
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instance-types | jq .- launch an instance
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instance-operations/launch -d '{
"region_name": "us-east-3",
"instance_type_name": "gpu_1x_gh200",
"ssh_key_names": ["my-key"],
"file_system_names": [],
"quantity": 1
}' -H "Content-Type: application/json"this prints out the instance id. Remember it.
- check the instance ip
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instances/YOUR-INSTANCE-ID | jq .- wait for it to boot, then connect and build
ssh ubuntu@YOUR-INSTANCE-IP-
clone the repo
-
generate a new ssh key exclusive to lambda deployments
(on your local machine)
ssh-keygen -t ed25519 -f ~/.ssh/lambda_deploy_key -N ""-
add the key to ssh keys on github (https://github.com/settings/keys)
-
add the private key to your lambda instance:
scp ~/.ssh/lambda_deploy_key ubuntu@YOUR-INSTANCE-IP:~/.ssh/
ssh ubuntu@YOUR-INSTANCE-IP "chmod 600 ~/.ssh/lambda_deploy_key"- set up Github ssh config on lambda
ssh ubuntu@YOUR-INSTANCE-IP "echo 'Host github.com
IdentityFile ~/.ssh/lambda_deploy_key' >> ~/.ssh/config"(the two steps above are combined in the helper_scripts/setup_lambda_ssh.sh script)
- clone the repo
git clone [email protected]:Simona-Biosystems/n_body_approx.git && cd n_body_approx(alternatively, if you also want to copy untracked files, rsync the repo to the instance)
docker build -t nbody-cuda .
nvidia-smi # check gpu access
docker run --gpus all -it -v $(pwd):/n_body_approx nbody-cuda- copy the files you want to keep for example using
rsync(dry run first with --dry-run)
# TODO: verify this works
rsync -avz --include='_/' \
--include='_/checkpoints/_/generated_trajectories/\*\*/plots/_.json' \
--include='_/avg_p_values_vs_checkpoints.png' \
--include='_/individual*p_values_vs_checkpoints.png' \
--include='*/interactive*avg_p_values_vs_checkpoints.html' \
--exclude='*' \
ubuntu@YOUR-INSTANCE-IP:/path/to/n_body_approx/runs/ ./local_backup/runs/If you also want to persist the installed packages, you can just rsync the venv directory:
rsync -avz ubuntu@YOUR-INSTANCE-IP:/home/ubuntu/venv/ ./local_backup/venv/Note: it's possible to also use Lambdalabs' filesystems to persist data between launches, but usually it's not worth it, since the instance is not guaranteed to be in the same region when you want to use it again (and the filesystem and the instance need to be in the same region since you cannot move either of them to another region. Furthermore, you can only access a filesystem from a running instance). Read more at https://docs.lambdalabs.com/public-cloud/filesystems/
- get instance id
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instances | jq .- terminate the instance
curl -u YOUR-API-KEY: https://cloud.lambdalabs.com/api/v1/instance-operations/terminate -d '{
"instance_ids": ["YOUR-INSTANCE-ID"]
}' -H "Content-Type: application/json"- if you wanna terminate the instance automatically after some time, you can do so using something like:
(sleep 6h && ...) &docker build -t nbody-cuda-gh200 -f Dockerfile_gh200 .
docker run --gpus all -it -v $(pwd):/n_body_approx nbody-cuda-gh200