This is a small pipeline to generate avian mitogenomes from Illumina short paired-end reads when having deep whole-genome coverage.
It uses the following programs:
- seqtk 1.2: for subsampling fastq.gz files > ~ 6GB (https://github.com/lh3/seqtk)
- trimmomatic 0.39: for removing adapters (http://www.usadellab.org/cms/?page=trimmomatic)
- NOVOPlasty 4.3.1: for mitogenome assembly (https://github.com/ndierckx/NOVOPlasty)
These scripts are designed to run on the Harvard FAS Cannon Cluster.
The general structure of the working directory is as follows:
antbird_mitogenomes/
-00_scripts/: Contains all scripts, configuration, programs, and other files necessary to run these commands.
-01_rawreads/: Contains raw reads in fast.gz format for all individuals.
-02_subsampled/: Directory where subsampled fastq files will be generated by seqtk (01_subsample.sh).
-03_trimmed-reads/: Directory where clean reads in fastq file will be generated by trimmomatic (02_trimmomatic.sh).
-04_orphaned-reads/: Directory where trimmomatic will place unpaired reads (02_trimmomatic.sh).
-05_mitogenomes/: Directory where NOVOPlasty will place Circularized mitogenomes in fasta format, log and report files (03_Novoplasty.sh, 03_config.txt, 03_batch.txt).
-06_annotation/: This folder contains annotation information. I annotate mitogenomes externally using the MITOS web server (http://mitos.bioinf.uni-leipzig.de/index.py) and then copy the GFF and nucleotide and aminoacid sequences into this folder. Parameters used in MITOS can be found in the file 04_annotation_parameters.txt.