Skip to content

TheMightiestCarrot/Extending-the-N-body-Benchmark-A-Cross-Model-Study-of-Geometric-Deep-Learning-Architectures

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

n_body_approx

This document provides instructions on how to set up, train, and benchmark models on nbody simulations in self-feed mode.

Installation

The easiest and most convenient way to install the project is to use Docker.

Building the Docker image

docker build -t nbody-cuda .

Running the Docker container

docker run --gpus all -it -v $(pwd):/n_body_approx nbody-cuda

Training the Models

To train, run the training script with the desired parameters. Here is an example command:

python -m train --config config.yaml --trainer_type trainer_nbody --model_type ponita --dataloader_type ponita_nbody --trainer.learning_rate 1.8955963499765176 --model.hidden_features 128 --model.num_layers 6 --trainer.steps_per_epoch 1000 --trainer.test_macros_every 10

(Flags after --config overwrite config.yaml defaults)

Alternatively, feel free to make use of the provided config.yaml file.

See utils/config_models.py for other parameters and how to change them.

Running Inference and Plotting Macros

Inference and macro plotting with its evaluation is automatic. You can tweak the frequency of this using the --trainer.test_macros_every flag.

If needed, you can also run this on demand:

python -m self_feed --config runs/ponita/2025-08-27_13-09-57/config.yaml --trainer.model_path runs/ponita/2025-08-27_13-09-57/model_best_valid_loss.pth

(self-feed)

python -m helper_scripts.visualize --folder runs/af3/2025-08-29_08-21-34/checkpoints/14/trajectories_data --sim-index=0-10

(macro plotting and evaluation)

Running in the cloud (Lambda Labs)

(full API reference: https://cloud.lambdalabs.com/api/v1/docs)

Setup

  • log in to your Lambda Labs account

  • create an API key in the Lambda Labs console

  • set your lambda API key as an environment variable for convenience

export LAMBDA_API_KEY=YOUR-API-KEY
  • generate a local ssh key if you don't have one

You can also use the setup_lambda_full.sh script to automate launching an instance, syncing a dataset, and preparing the Docker environment. The script accepts various command line options to override defaults such as GPU type and dataset name. The Dockerfile is selected automatically based on the GPU type (e.g. Dockerfile_gh200 for GH200 instances). When the dataset name is provided, any directories whose names start with that dataset name (for example DATASET_extra) will also be copied to the remote instance. Large *.pt files inside those directories are skipped to reduce bandwidth:

./setup_lambda_full.sh -t gpu_1x_a10 -d my_dataset_name

Run ./setup_lambda_full.sh -h for the full list of available parameters.

ssh-keygen -t ed25519 -C "[email protected]"
  • add your existing key to lambda
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/ssh-keys -d '{
  "name": "my-key",
  "public_key": "$(cat ~/.ssh/id_ed25519.pub)"
}' -H "Content-Type: application/json"
  • check available instance types and pricing
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instance-types | jq .

Launching an instance and connect to it

  • launch an instance
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instance-operations/launch -d '{
"region_name": "us-east-3",
"instance_type_name": "gpu_1x_gh200",
"ssh_key_names": ["my-key"],
"file_system_names": [],
"quantity": 1
}' -H "Content-Type: application/json"

this prints out the instance id. Remember it.

  • check the instance ip
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instances/YOUR-INSTANCE-ID | jq .
  • wait for it to boot, then connect and build
ssh ubuntu@YOUR-INSTANCE-IP
  • clone the repo

  • generate a new ssh key exclusive to lambda deployments

(on your local machine)

ssh-keygen -t ed25519 -f ~/.ssh/lambda_deploy_key -N ""
scp ~/.ssh/lambda_deploy_key ubuntu@YOUR-INSTANCE-IP:~/.ssh/
ssh ubuntu@YOUR-INSTANCE-IP "chmod 600 ~/.ssh/lambda_deploy_key"
  • set up Github ssh config on lambda
ssh ubuntu@YOUR-INSTANCE-IP "echo 'Host github.com
IdentityFile ~/.ssh/lambda_deploy_key' >> ~/.ssh/config"

(the two steps above are combined in the helper_scripts/setup_lambda_ssh.sh script)

  • clone the repo
git clone [email protected]:Simona-Biosystems/n_body_approx.git && cd n_body_approx

(alternatively, if you also want to copy untracked files, rsync the repo to the instance)

Building and running the container

docker build -t nbody-cuda .
nvidia-smi # check gpu access
docker run --gpus all -it -v $(pwd):/n_body_approx nbody-cuda

Copying files from the instance

  • copy the files you want to keep for example using rsync (dry run first with --dry-run)

# TODO: verify this works

rsync -avz --include='_/' \
 --include='_/checkpoints/_/generated_trajectories/\*\*/plots/_.json' \
 --include='_/avg_p_values_vs_checkpoints.png' \
 --include='_/individual*p_values_vs_checkpoints.png' \
 --include='*/interactive*avg_p_values_vs_checkpoints.html' \
 --exclude='*' \
 ubuntu@YOUR-INSTANCE-IP:/path/to/n_body_approx/runs/ ./local_backup/runs/

If you also want to persist the installed packages, you can just rsync the venv directory:

rsync -avz ubuntu@YOUR-INSTANCE-IP:/home/ubuntu/venv/ ./local_backup/venv/

Note: it's possible to also use Lambdalabs' filesystems to persist data between launches, but usually it's not worth it, since the instance is not guaranteed to be in the same region when you want to use it again (and the filesystem and the instance need to be in the same region since you cannot move either of them to another region. Furthermore, you can only access a filesystem from a running instance). Read more at https://docs.lambdalabs.com/public-cloud/filesystems/

Shutting down the instance

  • get instance id
curl -u $LAMBDA_API_KEY: https://cloud.lambdalabs.com/api/v1/instances | jq .
  • terminate the instance
curl -u YOUR-API-KEY: https://cloud.lambdalabs.com/api/v1/instance-operations/terminate -d '{
"instance_ids": ["YOUR-INSTANCE-ID"]
}' -H "Content-Type: application/json"
  • if you wanna terminate the instance automatically after some time, you can do so using something like:
(sleep 6h && ...) &

Running the container on a GH200 GPU

docker build -t nbody-cuda-gh200 -f Dockerfile_gh200 .
docker run --gpus all -it -v $(pwd):/n_body_approx nbody-cuda-gh200

About

Code accompanying the "Extending the N-body Benchmark: A Cross-Model Study of Geometric Deep Learning Architectures" paper.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.9%
  • Shell 3.0%
  • Dockerfile 0.1%