Skip to content

Ultima scripts related to downstream processing of WGS, MRD, methylation analysis

License

Notifications You must be signed in to change notification settings

Ultimagen/VariantCalling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,889 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VariantCalling

Code style: black

This package provides a set of tools to assist variant calling on Ultima data. The best practice pipeline is published here. The code below is used mostly in the post-GATK filtering step.

In addition, the code provides

  • Tools to perform evaluation of the callset relative to the ground truth.
  • Tools to perform building a database of noisy locaitons (SEC) and filtering callset relative to them - still undocumented
  • Set of tools for MRD (minimal residual disease) - still undocumented.

Setup

  1. Make sure git-lfs is installed on your system if you want to clone test resources along with the code (https://git-lfs.github.com/)
  2. Clone VariantCalling repository to e.g. software/VariantCalling
  3. Install ugbio_utils sub-module using git submodule update --init --recursive

If you want the git to always update submodules - run: git config --global submodule.recurse true. Alternatively, run git config --local submodule.recurse true in VariantCalling folder (to turn this behavior on for a single repo only)

  1. Create the four conda environments:

    • conda env create -f setup/environment.yml
    • conda env create -f setup/other_envs/ucsc.yml
    • conda env create -f setup/other_envs/cutadapt.yml
    • conda env create -f setup/other_envs/cnmops.yml
  2. Activate the main conda environment

    • conda activate genomics.py3
  3. Install the package

    • cd software/VariantCalling
    • pip install .

Using ugvc package

Run through cli

To get a list of available cli tools:

python /path/to/ugvc

To run a specific tool:

python /path/to/ugvc <tool_name> <args>

Run individual tools not through CLI

coverage_analysis:
	Run full coverage analysis of an aligned bam/cram file

evaluate_concordance:
	Calculate precision and recall for compared HDF5

filter_variants_pipeline:
	POST-GATK variant filtering

run_comparison_pipeline:
	Concordance between VCF and ground truth

train_models_pipeline:
	Train filtering models

Documentation of individual tools:

Howtos

Test

Recommended way to run tests for external users

./run_tests.sh

This script will validate that test resources were correctly cloned, and only then run tests

Run all tests

python -m pytest

Run unit-tests

python -m pytest test/unit

Run system-tests

python -m pytest test/system

Git-lfs

Whenever commiting a data-file to the repo, check that it's suffix is tracked by git-lfs in .gitattributes If not, add the new suffix to the .gitattributes file before adding the data-file and commiting it. Also make sure to commit .gitattributes itself.

git-lfs track "*.new_suffix"

Development guidelines

  1. Always develop on a branch, not on master
  2. Public functions/classes should be tested, using either pytest or unittest syntax
  3. commit and push your changes to that branch on the remote repo
  4. Open a pull-request through github
    1. Add at least one code reviewer
    2. Wait for CI tests to pass (green V sign)
  5. to avoid submodule sync issues, cd ugbbio_utils; git checkout main; cd ..; git commit -a -m "update submodules"; git push
  6. scripts that you want to be available on the path should be added to setup.py
  7. scripts that you want to be available to ugvc should be added to __main__.py
  8. Code changes should pass all pre-commit hooks

How To pre-commit

pre-commit hooks are configured within .pre-commit-config.yaml

install: https://pre-commit.com/#installation

After pre-commit package is installed, you need to set git hooks scripts:

pre-commit install
pre-commit install -t pre-commit

After the installation it will run the pre-commit hooks for all files changed as part of the commit. This should look like this, notice mostly the red "Failed" issues that you must fix, the pre-commit verifies the fix before enables the commit:

trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check json...........................................(no files to check)Skipped
check for added large files..............................................Passed
pycln....................................................................Passed
isort....................................................................Passed
black....................................................................Passed
flake8...................................................................Passed
pylint...................................................................Passed
[master 9a1a910e] Test pre-commit
 1 file changed, 1 deletion(-)

For running all pre-commit hooks on all files (used for initial pre-commit run) use: pre-commit run --all-files

The hooks we use are:

pycln - remove unused import statements

isort - Python utility library to sort imports alphabetically, and automatically separated into sections and by type

black - uncompromising Python code formatter

flake8 - python coding style guide for PEP8

pylint - python static code analysis tool

About

Ultima scripts related to downstream processing of WGS, MRD, methylation analysis

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 23