Quantum Clifford Circuit Synthesis Environment#506
Open
y-richie-y wants to merge 2 commits intoPufferAI:3.0from
Open
Quantum Clifford Circuit Synthesis Environment#506y-richie-y wants to merge 2 commits intoPufferAI:3.0from
y-richie-y wants to merge 2 commits intoPufferAI:3.0from
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a native Clifford synthesis env to
pufferlib.ocean, plus a small reference env used for correctness tests and native/reference parity checks.Clifford synthesis is a task in quantum computing: given a target Clifford tableau, find a sequence of Clifford gates that implements it.
This can be framed as a reinforcement learning problem where the agent learns to transform a tableau to the identity tableau.
For related RL-based Clifford synthesis work, see Kremer et al., Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning (arXiv:2405.13196, 2024): https://arxiv.org/abs/2405.13196
The env models synthesis over binary symplectic residuals:
2n x 2nbinary residual matrixH,S,V,HS,HVon each qubit, plusCZ(i, j)for each unordered qubit pair-single_qubit_costfor single-qubit gates,-1forCZ, optionalgoal_bonuson reaching identity. There is also a hamming-weight penalty which works well in practiceAPI
Native env:
Clifford(num_envs=1, n_qubits=6, difficulty=10, max_steps=200, single_qubit_cost=0.01, goal_bonus=0.0, use_reset_pool=True, log_interval=128, buf=None, seed=0, render_mode=None)Runtime controls kept on the native env:
set_difficulty(...)set_max_steps(...)set_matrix(...)flush_logs()difficultyis part of the public interface because curriculum learning is useful for this task, and adjusting scramble depth during training/evaluation is a practical control point.Fractional difficulties are supported by interpolating between two integer difficulty levels.
Implementation note:
uint64_tform, so the current limit is2 * n_qubits <= 64Validation
The added tests cover:
H,S,V,HS,HV, andCZPerformance
The raw throughput on my machine at
6qubits is about8.06MSPS at128envs.Reset states are produced by random walks, so higher difficulty means more work per reset. To keep this cheap, the native
implementation uses a reset pool: resets reuse a common prefix of the walk and only diverge in the final steps. That preserves
diversity while cutting most of the reset cost. Measured with
num_envs=2048,n_qubits=6:difficulty=1000:33.1->482.3vec resets/sec withuse_reset_pool=True(14.6x)difficulty=1000,max_steps=200:5.84M->9.73M(1.66x)difficulty=1000,max_steps=16:0.98M->6.36M(6.47x)Possible Extensions