Legofit
infers population history from nucleotide site patterns.
|
Tabulate site pattern frequencies from .daf files.
Scrmpat reads data generated by scrm (with option -transpose-segsites) tabulates counts of nucleotide site patterns, writing the result to standard output.
Usage: scrmpat [options] <x> <y> ... where <x>, <y>, etc. are arbitrary labels, whose number and order must agree with that of the populations specified in the scrm command line (using scrm arguments -I and -eI). Labels may not include the character ":". Reads standard input; writes to standard output. Max number of input files: 32. Options may include: -F or --logFixed log fixed sites to scrmpat.log -a or --logAll log all sites to scrmpat.log --version Print version and exit -h or --help Print this message
scrmpat
parses a file generated using scrm
. The scrm
command should include the option -transpose-segsites
. Let us assume you have done this, that file foo.scrm
contains the output simulated by scrm
, and that these simulated data included genotypes referring to four populations, labeled "x", "y", "n", and "d". The scrmpat
command would look like this:
scrmpat --infile foo.scrm x y n d
scrmpat
's notion of a "population" differs from that of scrm
, in that scrmpat
treats samples of different ages as separate populations, even if they reside in the same population on the scrm
command line. For example, consider the following scrm
command line:
scrm 3 -I 2 1 1 -eI 0.5 0 1
This specifies three haploid samples distributed across two populations. The -I
argument says that each population has a sample at time 0. The -eI
argument says that, in addition, population 2 has a sample at time 0.5. All three samples would be treated as separate populations by scrmpat
. Thus, the scrmpat
command line should list three labels, as in "scrmpat x y z".
In the output, site pattern "x:y" refers to the pattern in which the derived allele is present haploid samples from "x" and "y" but not on those from other populations. The order of the command-line arguments determines the order in which labels are sorted on output. Given the command line above, we would get a site pattern labeled "x:y:d" rather than, say, "y:x:d".
The output looks like this:
# scrmpat version 1.3 # Population labels: x y n d # Number of site patterns: 10 # Tabulated 12327755 SNPs # SitePat E[count] x:y 340952.4592501 x:n 46874.1307236 x:d 46034.4670204 y:n 55137.4236715 y:d 43535.5248078 n:d 231953.3372578 x:y:n 91646.1277991 x:y:d 88476.9619569 x:n:d 96676.3877423 y:n:d 100311.4411513
The left column lists the site patterns that occur in the data. The right column gives the expected count of each site pattern. These are not integers, because they represent averages over all possible subsamples consisting of a single haploid genome from each population.