Legofit
infers population history from nucleotide site patterns.
|
Calculate reference allele frequency, raf.
Input file should consist of tab-separated columns:
This can be generated from a vcf file as follows:
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%GT]\n' fname.vcf.gz
Output is in 5 columns, separated by tabs:
The input files should include all sites at which derived alleles are present in any of the populations under study. For example, consider an analysis involving modern humans and Neanderthals. The modern human data must include all sites at which Neanderthals carry derived alleles, even if these sites do not vary among modern humans. To accomplish this, it is best to use whole-genome data for all populations.
The input should not contain duplicate nucleotide sites, the chromosomes should be sorted in lexical order, and within each chromosome, the nucleotides should be in numerical order. Otherwise, raf will abort with an error.
Sites are rejected unless they have a single ref. Missing values are allowed for the alt allele. At the end of the job a summary of rejected sites is written to stderr.