popdisp

population dispersals: method to estimate allele frequencies in a region considering geographical locations of sampling sites, the nonequal number of samples in locations, and, most crucially, possible ways of dispersals within a region. It utilizes hierarchical Bayesian Modeling, compositional nature of frequencies and the Wright-Fisher model of drift.

If you use popdisp in your work, please cite: Historical Routes for Diversification of Domesticated Chickpea Inferred from Landrace Genomics
Anna A. Igolkina, Nina V. Noujdina, Margarita Vishnyakova, Travis Longcore, Eric von Wettberg, Sergey V. Nuzhdin, Maria G. Samsonova
Molecular Biology and Evolution, Volume 40, Issue 6, June 2023, msad110.

Bayesian model

The popdisp model is Bayesian, with the following structure:

f_A — characteristic allele frequency of the population
n_i — number of haplotypes
y_i — number of alternative alleles
f_i — frequency of the alternative allele
V — covariance matrix
s — scale parameter
f_A = ilr^{-1}(x_A) — inverse isometric log-ratio transform
x_i = ilr(f_i) — isometric log-ratio transform

Flowchart

The following flowchart illustrates the work of the package:

Input data

The popdisp model requires two main input components:

Covariance matrix
Defines relationships between sampling sites, based on hypothetical dispersal routes within the region.
Allele frequency data
Provides observed allele counts and frequencies for each sampling location within the region.

Both components must be provided together, as they jointly describe the geographic and genetic structure used by the model.

Chickpea data

This repository includes the chickpea datasets used in the article mentioned above:

data/cov_mx
data/samples

Quick Start

A typical workflow with popdisp consists of four main steps:

from popdisp import PopData, ClusterCov, HistOpt

# 1. Load allele frequency data
data = PopData(file_qdata, file_ndata)  
# file_qdata – allele frequencies by location
# file_ndata – number of haplotypes per location

# 2. Load covariance matrix
mx_cov = ClusterCov(file_cov)  
# file_cov – covariance matrix describing dispersal routes

# 3. Initialize the optimizer
opt_hist = HistOpt(mx_cov, data, ab_flag=True, ilr_flag=True)  
# ab_flag – use A/B parametrization
# ilr_flag – apply isometric log-ratio transform

# 4. Run the optimization
opt_hist.optimise(path_res, n_thr=30)  
# path_res – path to save the results
# n_thr – number of threads to use

Pipeline for reproducing the chickpea results

If you want to reproduce the results of the chickpea study reported in щгк зфзук, please run the following scripts provided in this repository:

# 1. Estimate allele frequencies
python estim_routes.py

# 2. Collect MCMC statistics
python get_stat.py

Requirements

To run popdisp methods, you need Python 3.4 or later. A list of required Python packages that the popdisp depends on, are in requirements.txt.

Authors

Anna Igolkina developed the popdisp package, e-mail.

License information

The popdisp package is open-sourced software licensed under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
docs		docs
simulation		simulation
README.md		README.md
estim_routes.py		estim_routes.py
get_stat.py		get_stat.py
pop_hist_opt_wright_fisher.py		pop_hist_opt_wright_fisher.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

popdisp

Bayesian model

Flowchart

Input data

Chickpea data

Quick Start

Pipeline for reproducing the chickpea results

Requirements

Authors

License information

About

Uh oh!

Releases

Packages

Languages

iganna/popdisp

Folders and files

Latest commit

History

Repository files navigation

popdisp

Bayesian model

Flowchart

Input data

Chickpea data

Quick Start

Pipeline for reproducing the chickpea results

Requirements

Authors

License information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages