Skip to content

This framework generates synthetic tabular data using Structural Causal Models, capturing explicit causal relationships. It supports realistic data creation, augmentation, controlled model testing, simulations of interventions, and privacy-preserving data sharing, with flexible graph and function definitions.

License

Notifications You must be signed in to change notification settings

jacons/CausalSDG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tabular Data Generator with Structural Causal Models

License Python Version

Description

This framework provides a tool for generating synthetic tabular data based on Structural Causal Models (SCMs). By utilizing SCMs, the framework allows for the explicit definition of causal relationships between variables, ensuring that the generated data reflects these underlying dependencies. This approach is particularly useful for:

  • Realistic data generation: Creating datasets that preserve the complex causal interactions present in real-world data.
  • Data augmentation: Increasing the size and diversity of existing datasets while maintaining causal consistency.
  • Model testing and validation: Generating controlled data to evaluate the behavior of machine learning algorithms in different scenarios.
  • Simulations: Conducting "what-if" experiments and analyzing the consequences of interventions on variables.
  • Privacy-preserving data sharing: Sharing synthetic data that retains important statistical characteristics without revealing sensitive information.

The framework offers a flexible interface for defining causal graphs, specifying the functions that describe the relationships between variables, and generating datasets of arbitrary sizes.

⚙️ Installation

1. Clone the repository

git clone https://github.com/jacons/CausalSDG
cd CausalSDG
pip install virtualenv # (if you don't already have virtualenv installed)
virtualenv venv # to create your new environment (called 'venv' here)
source venv/bin/activate # to activate your new environment
pip install -r requirements.txt # to install the requirements in the current environment

2. Minimal example

Go to "example" folder and run the notebook min_example.ipynb to see a minimal example of how to use the framework.

License

MIT

Authors

📝 Notes

  • Code is written entirely by the author.
  • Documentation is generated by ChatGPT and later reviewed and validated by the author.

Citation

If you use this code or our results in your research, please cite our paper as follows:

@article{iommi2025causal,
  title={Causal Synthetic Data Generation in Recruitment},
  author={Iommi, Andrea and Mastropietro, Antonio and Guidotti, Riccardo and Monreale, Anna and Ruggieri, Salvatore},
  journal={arXiv preprint arXiv:2511.16204},
  year={2025}
}

About

This framework generates synthetic tabular data using Structural Causal Models, capturing explicit causal relationships. It supports realistic data creation, augmentation, controlled model testing, simulations of interventions, and privacy-preserving data sharing, with flexible graph and function definitions.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published