This framework provides a tool for generating synthetic tabular data based on Structural Causal Models (SCMs). By utilizing SCMs, the framework allows for the explicit definition of causal relationships between variables, ensuring that the generated data reflects these underlying dependencies. This approach is particularly useful for:
- Realistic data generation: Creating datasets that preserve the complex causal interactions present in real-world data.
- Data augmentation: Increasing the size and diversity of existing datasets while maintaining causal consistency.
- Model testing and validation: Generating controlled data to evaluate the behavior of machine learning algorithms in different scenarios.
- Simulations: Conducting "what-if" experiments and analyzing the consequences of interventions on variables.
- Privacy-preserving data sharing: Sharing synthetic data that retains important statistical characteristics without revealing sensitive information.
The framework offers a flexible interface for defining causal graphs, specifying the functions that describe the relationships between variables, and generating datasets of arbitrary sizes.
git clone https://github.com/jacons/CausalSDG
cd CausalSDG
pip install virtualenv # (if you don't already have virtualenv installed)
virtualenv venv # to create your new environment (called 'venv' here)
source venv/bin/activate # to activate your new environment
pip install -r requirements.txt # to install the requirements in the current environmentGo to "example" folder and run the notebook min_example.ipynb to see a minimal example of how to use the framework.
- Code is written entirely by the author.
- Documentation is generated by ChatGPT and later reviewed and validated by the author.
If you use this code or our results in your research, please cite our paper as follows:
@article{iommi2025causal,
title={Causal Synthetic Data Generation in Recruitment},
author={Iommi, Andrea and Mastropietro, Antonio and Guidotti, Riccardo and Monreale, Anna and Ruggieri, Salvatore},
journal={arXiv preprint arXiv:2511.16204},
year={2025}
}