Legofit
infers population history from nucleotide site patterns.
scrmpat

Tabulate site pattern frequencies from .daf files.

Scrmpat: tabulates site patterns

Scrmpat reads data generated by scrm (with option -transpose-segsites) tabulates counts of nucleotide site patterns, writing the result to standard output.

Usage

Usage: scrmpat [options] <x> <y> ...
  where <x>, <y>, etc. are arbitrary labels, whose number and order
  must agree with that of the populations specified in the scrm command
  line (using scrm arguments -I and -eI). Labels may not include the
  character ":". Reads standard input; writes to standard
  output. Max number of input files: 32.

Options may include:
   -F or --logFixed
      log fixed sites to scrmpat.log
   -a or --logAll
      log all sites to scrmpat.log
   --version
      Print version and exit
   -h or --help
      Print this message

Example

scrmpat parses a file generated using scrm. The scrm command should include the option -transpose-segsites. Let us assume you have done this, that file foo.scrm contains the output simulated by scrm, and that these simulated data included genotypes referring to four populations, labeled "x", "y", "n", and "d". The scrmpat command would look like this:

scrmpat --infile foo.scrm x y n d

scrmpat's notion of a "population" differs from that of scrm, in that scrmpat treats samples of different ages as separate populations, even if they reside in the same population on the scrm command line. For example, consider the following scrm command line:

scrm 3 -I 2 1 1 -eI 0.5 0 1

This specifies three haploid samples distributed across two populations. The -I argument says that each population has a sample at time 0. The -eI argument says that, in addition, population 2 has a sample at time 0.5. All three samples would be treated as separate populations by scrmpat. Thus, the scrmpat command line should list three labels, as in "scrmpat x y z".

In the output, site pattern "x:y" refers to the pattern in which the derived allele is present haploid samples from "x" and "y" but not on those from other populations. The order of the command-line arguments determines the order in which labels are sorted on output. Given the command line above, we would get a site pattern labeled "x:y:d" rather than, say, "y:x:d".

The output looks like this:

# scrmpat version 1.3
# Population labels: x y n d
# Number of site patterns: 10
# Tabulated 12327755 SNPs
#       SitePat             E[count]
            x:y       340952.4592501
            x:n        46874.1307236
            x:d        46034.4670204
            y:n        55137.4236715
            y:d        43535.5248078
            n:d       231953.3372578
          x:y:n        91646.1277991
          x:y:d        88476.9619569
          x:n:d        96676.3877423
          y:n:d       100311.4411513

The left column lists the site patterns that occur in the data. The right column gives the expected count of each site pattern. These are not integers, because they represent averages over all possible subsamples consisting of a single haploid genome from each population.