Skip to content

marcellobeltrami/VCF_annotator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧬 VCF Annotator 🧬

A light-weight and portable command-line VCF Annotator tool designed to filter and annotate VCF files for human genomes. It filters chromosomes and mutations and then annotates the VCF file using OpenCravat. Its greatest benefit is not needing to download and setup a local database as it is seen in other annotators. Additionally, this software allows fo seamless integration with MAFTools, as it generates a MAF file, and required MAF sample annotation (called Clinical_Data in MAF tools). A starting R script for MAFTools is also generated at runtime, which can then be modified as most appropriate by for own analysis.

Alt text

Features

  • Filters VCF chromosomes based on specified criteria.
  • Filters mutations based on sample genotypes.
  • Annotates VCF files using OpenCravat (requires an account at OpenCravat | https://opencravat.org/ |. This is a free OpenSource website).
  • Supports saving and loading OpenCravat credentials for convenience.

Pre-requisites

Python version: 3.11.4. For dependencies see requirements.txt

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/vcf-annotator.git
    cd vcf_annotator
  2. Install the required dependencies:

    pip install -r requirements.txt

Quick-Start

To run VCF annotator, use the following command:

python ./main_cli.py -i <input_file> -o <output_file> -g <sample_groups> [-temp <y/n>] [-Num <normal_mutation_thresholds>] [-Tum <treatment_mutation_thresholds>]

To remove Open Cravat credentials and results files located in VCF_annotator/results/ directory, use the following command:

python ./cleanup.py 

Detailed cli explanation

-i, --input_file

Required: Yes
Usage: -i <input_file> or --input_file <input_file>
Description:
    This argument specifies the path to the input VCF (Variant Call Format) file that will be processed by the tool.
    The VCF file contains the genomic variants data that needs to be filtered and annotated.
    Example: -i data/sample.vcf or --input_file data/sample.vcf

-o, --output_file

Required: Yes
Usage: -o <output_file> or --output_file <output_file>
Description:
    This argument specifies an identifier for the output files generated by the tool.
    Note that this is not the full path to the output file, but rather an identifier that will be included in the output file names.
    The actual files will be saved in specified directories with names incorporating this identifier.
    
    ```sh
    Example: -o result1 or --output_file result1
    ```

-g, --sample_groups

Required: Yes
Usage: -g <sample_groups> or --sample_groups <sample_groups>
Description:
    This argument specifies the path to a TAB delimited file containing sample names found in the VCF file.
    The first column of this file should contain the normal samples, and the second column should contain the treatment samples.
    This file helps in categorizing the samples for the filtering process.
    
    ```sh
    Example: -g data/sample_groups.txt or --sample_groups data/sample_groups.txt
    ```

-temp, --temp_keep

Required: No
Usage: -temp <y/n> or --temp_keep <y/n>
Type: str
Default: "n"
Description:
    This argument determines whether intermediate files generated during the processing should be kept or deleted after the analysis is complete.
    Use y to keep the temporary files and n to delete them.
    ```sh
    Example: -temp y or --temp_keep y
    ```

-Num, --normal_mutations

Required: No
Usage: -Num <threshold1> <threshold2> or --normal_mutations <threshold1> <threshold2>
Type: int (expects two integers)
Default: [2, 0]
Description:
    This argument specifies the thresholds for filtering normal mutations.
    The tool will keep mutations found in the normal samples that are equal to or above the first threshold and equal to or below the second threshold.
    This helps in fine-tuning which mutations are considered significant for the normal samples.
    
    ```sh
    Example: -Num 2 0 or --normal_mutations 2 0
    ```

-Tum, --treatment_mutations

Required: No
Usage: -Tum <threshold1> <threshold2> or --treatment_mutations <threshold1> <threshold2>
Type: int (expects two integers)
Default: [0, 2]
Description:
    This argument specifies the thresholds for filtering treatment mutations.
    The tool will keep mutations found in the treatment samples that are equal to or below the first threshold and equal to or above the second threshold.
    This helps in fine-tuning which mutations are considered significant for the treatment samples.
    
    ```sh
    Example: -Tum 0 2 or --treatment_mutations 0 2
    ```

About

Software aiming at making VCF annotation, filtering and comparison more accessible.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages