Data Version Control (DVC) repository for SAST AI workflow data.
This repository uses:
- Git for code and metadata versioning
- Git tags for version releases
- DVC for large file storage and versioning
# Install DVC with S3 support
pip install dvc dvc-s3We use Git tags to mark specific versions of the repository and its data.
git tag# After making changes and committing
git tag v1.1.0
git push origin v1.1.0git checkout v1.0.0When you have new or updated data:
# 1. Add/update your data files
# (DVC is already tracking them)
# 2. Push data to DVC remote storage
dvc push
# 3. Commit the updated .dvc files
git add *.dvc
git commit -m "Update data for version X.X.X"
# 4. Create and push a new version tag
git tag vX.X.X
git push origin main
git push origin vX.X.XTo get the complete repository with all data:
# 1. Clone the Git repository
git clone <repository-url>
cd dvc-repo
# 2. Pull all DVC-tracked data
dvc pull# Clone and checkout a specific tag
git clone <repository-url>
cd dvc-repo
git checkout v1.0.0
dvc pullTo download just one specific file without cloning the entire repository:
# Download a single file to current directory
dvc get . path/to/file
# Download to a specific location
dvc get . path/to/file -o ./where-to-put-file
# Download from a specific version/tag
dvc get . path/to/file --rev v1.0.0 -o ./output-file# Get a specific ground truth sheet
dvc get . ground_truth_sheets/acl-2.3.2-1.el10.xlsx -o ./acl.xlsx
# Get known non-issues for a specific package
dvc get . known-non-issues-el10/acl/ignore.err -o ./acl-ignore.err
# Get from a specific version
dvc get . config.yaml --rev v1.0.0 -o ./config-v1.0.yaml.
├── config.yaml # Configuration file
├── ground_truth_sheets/ # DVC-tracked: Ground truth data
├── known-non-issues-el10/ # DVC-tracked: Known non-issues
├── prompts/ # DVC-tracked: AI prompts
├── testing-data-nvrs.yaml # DVC-tracked: Test data
└── *.dvc # DVC metadata files
DVC data is stored in MinIO S3-compatible storage. Configuration is in .dvc/config.
Ensure you have access to the DVC remote storage endpoint. You may need VPN access.
If you get "No module named 'dvc_s3'":
pip install dvc-s3