Skip to content

juyongjiang/ReflexiCoder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning

ReflexiCoder overview

Installation

Caution

This project requires CUDA 12.4. If you encounter segmentation faults, please verify your CUDA toolchain via nvcc --version.

1) Set up the TRL environment

conda create -n reflexicoder python=3.11
conda activate reflexicoder

pip install --upgrade pip
pip install vllm==0.8.5.post1
pip install setuptools
pip install flash-attn --no-build-isolation
pip install tensorboard

GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"
pip install selenium==4.15.2
pip install pillow==10.3.0

This installation will also install PyTorch v2.6.0. This version is required, as the provided vLLM binaries are built against it.

Authenticate to Hugging Face and Weights & Biases (optional but recommended):

huggingface-cli login  # Required for pushing datasets/models to the HF Hub
wandb login            # Enables experiment tracking during training

sudo apt-get install git-lfs
git-lfs --version

2) Install the Firejail sandbox

Firejail is an open-source Linux sandbox that isolates processes via namespaces and seccomp, reducing security risk when executing untrusted code.

git clone https://github.com/netblue30/firejail.git

cd firejail
chmod +x configure
./configure
find . -name "*.sh" -exec chmod +x {} \;
make
sudo make install

Data Preparation

For dataset download and preprocessing, please follow the Data section in the DeepCoder guideline.
To avoid redundant preprocessing, we provide the preprocessed parquet files under ./data, which can be used directly for training.

Training

GIT_LFS_SKIP_SMUDGE=1 pip install -e ".[dev]"

export TOKENIZERS_PARALLELISM=false
export TIMESTAMP=$(date +"%m-%d-%y-%T")
export CONFIG_GRPO="configs/reflexicoder/config_grpo.yaml" 
export MODEL_NAME_OR_PATH="/path_to_your_model/Qwen3-8B"
export DATASET_NAME="./data"
export OUTPUT_DIR="./output/$TIMESTAMP"
export ROLLOUT_FILE="$OUTPUT_DIR"
export LOG_FILE="$OUTPUT_DIR/training.log"

mkdir -p $OUTPUT_DIR

ACCELERATE_LOG_LEVEL=info \
    accelerate launch --config_file configs/accelerate_configs/zero2.yaml \
    src/open_r1/grpo.py --config $CONFIG_GRPO \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --dataset_name $DATASET_NAME \
    --output_dir $OUTPUT_DIR \
    --vllm_mode colocate 2>&1 | tee $LOG_FILE

Evaluation

We evaluate all baselines and RL-trained models on HumanEval, HumanEval+, MBPP, MBPP+, LiveCodeBench_v5, and CodeForces using the EvalChemy framework to ensure consistent evaluation.

For the full evaluation pipeline, please refer to the official EvalChemy and its README.

Performance & Token Efficiency

ReflexiCoder Overview

Performance on Benchmarks

Token Efficiency

Citation

If you use the data or code in this repo, please consider citing the following paper.

@article{jiang2026reflexicoder,
  title={ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning},
  author={Jiang, Juyong and Shen, Jiasi and Kim, Sunghun and Yoo, Kang Min and Kim, Jeonghoon and Kim, Sungju},
  journal={arXiv preprint arXiv:2603.05863},
  year={2026}
}

About

[ACL'26] Official Code for "ReflexiCoder: Teaching Large Language Models to Self-Reflect on Generated Code and Self-Correct It via Reinforcement Learning"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages