Quantum Clifford Circuit Synthesis Environment by y-richie-y · Pull Request #506 · PufferAI/PufferLib

y-richie-y · 2026-03-25T18:41:57Z

This PR adds a native Clifford synthesis env to pufferlib.ocean, plus a small reference env used for correctness tests and native/reference parity checks.

Clifford synthesis is a task in quantum computing: given a target Clifford tableau, find a sequence of Clifford gates that implements it.

This can be framed as a reinforcement learning problem where the agent learns to transform a tableau to the identity tableau.

         Initial tableau                          Identity tableau
        x0 x1 x2 | z0 z1 z2                     x0 x1 x2 | z0 z1 z2
      +-------------------+                    +-------------------+
  r0  | 1  0  0 | 0  0  1 |                    | 1  0  0 | 0  0  0 |
  r1  | 0  0  0 | 0  1  0 |                    | 0  1  0 | 0  0  0 |
  r2  | 0  0  1 | 1  0  1 |  --CZ(0,2),        | 0  0  1 | 0  0  0 |
      |---------+---------|      S(2), H(1)->  |---------+---------|
  r3  | 0  0  0 | 1  0  0 |                    | 0  0  0 | 1  0  0 |
  r4  | 0  1  0 | 0  0  0 |                    | 0  0  0 | 0  1  0 |
  r5  | 0  0  0 | 0  0  1 |                    | 0  0  0 | 0  0  1 |
      +-------------------+                    +-------------------+

For related RL-based Clifford synthesis work, see Kremer et al., Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning (arXiv:2405.13196, 2024): https://arxiv.org/abs/2405.13196

The env models synthesis over binary symplectic residuals:

observation: flattened 2n x 2n binary residual matrix
action space: H, S, V, HS, HV on each qubit, plus CZ(i, j) for each unordered qubit pair
reward: -single_qubit_cost for single-qubit gates, -1 for CZ, optional goal_bonus on reaching identity. There is also a hamming-weight penalty which works well in practice

API

Native env:

Clifford(num_envs=1, n_qubits=6, difficulty=10, max_steps=200, single_qubit_cost=0.01, goal_bonus=0.0, use_reset_pool=True, log_interval=128, buf=None, seed=0, render_mode=None)

Runtime controls kept on the native env:

set_difficulty(...)
set_max_steps(...)
set_matrix(...)
flush_logs()

difficulty is part of the public interface because curriculum learning is useful for this task, and adjusting scramble depth during training/evaluation is a practical control point.
Fractional difficulties are supported by interpolating between two integer difficulty levels.

Implementation note:

the native path stores tableau columns in packed uint64_t form, so the current limit is 2 * n_qubits <= 64

Validation

The added tests cover:

action count and deterministic ordering
reset behavior at zero and nonzero difficulty
matrix injection validation
native/reference parity for H, S, V, HS, HV, and CZ
native auto-reset behavior after terminal transitions

Performance

The raw throughput on my machine at 6 qubits is about 8.06M SPS at 128 envs.

Reset states are produced by random walks, so higher difficulty means more work per reset. To keep this cheap, the native
implementation uses a reset pool: resets reuse a common prefix of the walk and only diverge in the final steps. That preserves
diversity while cutting most of the reset cost. Measured with num_envs=2048, n_qubits=6:

pure reset throughput at difficulty=1000: 33.1 -> 482.3 vec resets/sec with use_reset_pool=True (14.6x)
rollout SPS at difficulty=1000, max_steps=200: 5.84M -> 9.73M (1.66x)
rollout SPS at difficulty=1000, max_steps=16: 0.98M -> 6.36M (6.47x)

Possible Extensions

Custom coupling graphs in addition to the current fully connected action set
Richer objective variants beyond the current gate-cost formulation
Visualisation + interactive demo

y-richie-y added 2 commits March 25, 2026 18:32

Add Ocean Clifford environment

bf1a910

Add minimal Clifford curriculum trainer

db3888a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantum Clifford Circuit Synthesis Environment#506

Quantum Clifford Circuit Synthesis Environment#506
y-richie-y wants to merge 2 commits intoPufferAI:3.0from
y-richie-y:clifford-ocean-pr

y-richie-y commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

y-richie-y commented Mar 25, 2026

API

Validation

Performance

Possible Extensions

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant