We are ByteDance Seed team.
You can get to know us better through the following channels👇
This is the open-source codes repository for paper It Takes Two to Entangle (ASPLOS'26). This project provides a tool to inductively infer the relations for tensors between sequential and distributed computation graphs, allowing checking the their equivalence scalably and efficiently. It supports verification for graphs dumped with our wrapped version of TorchDynamo, thus covers a wide range of PyTorch programs.
NOTE for Artifact Evaluation Reviewers: This also serves as the artifact for ASPLOS's Artifact Evaluation, targetting Availability and Evaluted badges. Please follow the instructions in Artifact Evaluation to reproduce the results in the paper.
For best reproducing, we recommend using the environment we used for the experiments:
- Ubuntu-22.04
- Should run on any other Linux distributions (not tested)
- Runs on MacOS, but significantly slower (tested)
- Cannot run on Windows since we used some Unix-like commands, but you can run on WSL.
- Python 3.12
- Rust supporting edition 2018
Other software recommendation are described in pyproject.toml and Cargo.toml.
Assuming you have installed
Python>= 3.12 withpipavailable.
# 1. Install Rust (for Linux, check the link for other systems)
# You can refer to https://rust-lang.org/learn/get-started/ to install Rust, or run the command below:
# WARNING: this command may update your shell profile(~/.bashrc or ~/.zshrc) to add Rust to PATH.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# NOTE: After installing Rust, you usually need to restart your terminal to activate it.
# 2. Install Entangle
# Assume you are starting under the root directory of this repository.
# This command installs `entangle` and `torchgraph`(deprecated) python packages and `tg` and `tginfer` commands.
pip install -e .
# 3. Build egger and then go back.
cd egger && cargo build --release && cd ..Please refer to the examples/README.md for instructions on running examples.
Thank you for your reviews.
This artifact is our implementation of the inductive relation refinement method described in the paper. We also provides input computation graphs for the open-source frameworks, but we choose not to publish the graphs of the company's proprietary models.
With this repository, the performance results for verification graphs from open-source frameworks can be reproduced, including
- (Figure 3) End-to-end verification time across different models
- (Figure 4) Scalability on verifying parallelized models.
- (Figure 6) Heatmap showing the number of times each lemmas is used for different models.
This repository does NOT contain codes to draw Figure 5 since it relies human efforts to manually count the lines of codes (LOC).
We first introduce the Code Structure, then provide Evaluation Instructions to reproduce open-source part results in the paper.
(Note: only key components are introduced here.)
The roles of the code files/directories are as described below in the comments.
egger
├── extractor # helper to extract output relations
├── src
│ ├── load.rs # load egraph dumped from Python side
│ ├── main.rs # entry point
│ ├── metadata.rs # internal types
│ ├── model.rs # operators
│ ├── rewrite # lemmas
│ ├── special # operators and lemmas for new frameworks
│ ├── symval # symbolic value manager
entangle # the key Python package
├── convert # converts graphs between different formats
├── graph # graph dumping tools wrapping TorchDynamo
├── ops # defines ATen and customized operators
├── pgraph # graph format dumped through our wrapping.
├── sgraph # core graph format used during verification
│ ├── egraph.py # egrpah format
│ ├── infer # contains different exploration strategies.
│ ├── sexpr.py # for expressions
│ ├── sgraph.py # for the whole computional graph
│ ├── transform.py # lower distributed operators
examples # examples, see examples/README.mdSetup your environment following instructions in previous Setup.
The example input compution graphs are in examples/data. Run the commands below to run the refinement inference and verification.
# Assuming you are in `examples`
# gpt starts 16 runs sequentially
python run_all.py gpt --all
# qwen2 starts 2 runs sequentially
python run_all.py qwen2 --all
# aws_llama starts 12 runs sequentially
python run_all.py aws_llama --all To easily compare with results in the paper, run the visualization script to generate figures after step 2.
python visualization.pyThe figures will be saved to examples/figures, including
one_layer_time.pdf: the end-to-end verification time results (Figure 3). The performance results can vary when you use different hardwares.GPT_scalability.pdf,Llama-3_scalibility.pdf: the scalability results (Figure 4), where you should see a bit super-linear time cost increasing with the parallel degrees and linear time cost increasing with the model sizes.lemma_applied_count_heatmap.pdf: the heatmap of lemma application counts (Figure 6), where you should see something similar to the one in the paper.
Refer to Use Your Own Models for instructions on capturing graphs from your own models and running verification.
Our implementation supports the most commonly used operators defined in ATen. Thus, as long as your PyTorch model can be captured by TorchDynamo completely without break, our tool can verify your model directly.
We wrapped TorchDynamo to generate computation graphs in the format of PickleableGraph (or pgraph, see definition) for convenience. We recommend you to use it to capture your model.
An example can be found in capture_example.py. Simply run the command below to run this example:
# In `examples` directory
python capture_example.pyThe output files will be printed on the terminal for you to check.
Since our implementation doesn't include some rerely used operators, and it is possible your model used some customized operators, you might need to
- Define the new operators. Examples can be found here.
- Define converter from
PickleableGraph(orpgraph) tosexpr. Examples can be found here.
For frameworks other than PyTorch (like TensorFlow, JAX, etc.), you must implement a converter from your framework's computation graph IR to Entangle's pgraph representation.
In this repository, we provided a simple incomplete example for HLO IR, where only a few operators required by our evaluation are implemented. You can refer to this example to implement your own converter.
Besides, new frameworks usually introduces new operators. Thus you should also follow New Operators to define the new operators used in your framework.
To run for your own model, you must provide clean input relations as mentioned in the paper. Examples can be found in verify_gpt.py, verify_aws_llama.py and verify_qwen2.py. The tool automatically found first Python class that inherits from ExplorativeConfig.
The commandline arguments are passed to the initializer of the module such that you can define your own arguments (like TP size).
With the captured graphs, you can run the tool using the command below:
export OUTDIR=<output directory>
mkdir -p $OUTDIR
tg infer \
--infer_manager greedy \
--stats \
-g <path to your graphs> \
--origin <directory name of sequential graph> \
--target <directory name of distributed graph> \
--graph_prefix <name of the graph break> \
--config_module <Python config file path> \
-o $OUTDIR \
[other options you defined inside your config module]Note for performance:
- We used Python
richpackage for fancy terminal outputs. However, we found it significantly slows down the overal performance. You may disable it by--disable_richfor better performance.
If your model used new operatros, you will need to also
- Define new operators. Examples can be found here.
- Define new lemmas regarding new operators if necessary. Examples can be found here.
If you discover a potential security issue in this project, or think you may have discovered a security issue, we ask that you notify Bytedance Security via our security center or vulnerability reporting email.
Please do not create a public GitHub issue.
This project is licensed under the Apache-2.0 License.
About ByteDance Seed Team
Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.
