Legofit
infers population history from nucleotide site patterns.
|
Functions for a moving blocks bootstrap. More...
#include "boot.h"
#include "misc.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <assert.h>
#include <gsl/gsl_rng.h>
Data Structures | |
struct | Boot |
Contains the all data involved in a moving blocks bootstrap. More... | |
Functions | |
double | interpolate (double p, double *v, long len) |
Interpolate in order to approximate the value v[p*(len-1)]. More... | |
long | LInt_div_round (long num, long denom) |
Divide num by denom and round the result to the nearest integer. | |
long | Boot_multiplicity (const Boot *self, long snpndx, long rep) |
How many copies of snp with index snpndx are present in a given repetition (rep)? | |
Boot * | Boot_new (int nchr, long nsnpvec[nchr], long nrep, int npat, long blocksize, gsl_rng *rng) |
Constructor for class Boot. | |
void | Boot_sanityCheck (const Boot *self, const char *file, int line) |
void | Boot_add (Boot *self, int chr, long snpndx, int pat, double z) |
Add one site pattern contribution to a Boot structure. More... | |
void | Boot_free (Boot *self) |
Destructor. | |
void | Boot_aggregate (Boot *self, int rep, int npat, double count[npat]) |
Add to an array the site pattern counts from a bootstrap replicate. More... | |
void | confidenceBounds (double *lowBnd, double *highBnd, double confidence, long len, double v[len]) |
Calculate confidence bounds from a vector of values representing samples drawn from the sampling distribution of some estimator. More... | |
void | Boot_print (const Boot *self, FILE *ofp) |
Print a Boot object. | |
Functions for a moving blocks bootstrap.
This bootstrap treats all nucleotides as a single array, ignoring the distinction between chromosomes. Thus, blocks may span chromosomes. This should not cause problems with high-quality genomes, in which chromosomes are known. When a block spans two chromosomes, the sites within the block are less correlated within that block, but this doesn't violate any assumption.
I'm more worried about genomes that consist of many small contigs. With such genomes, nucleotides in different contigs may be tightly linked, and these linked contigs may end up in different blocks. This violates the assumption (of the moving-blocks bootstrap) that observations in different blocks are only weakly correlated. I'm not sure what effect this will have, but I fear it will make the confidence intervals too narrow.
void Boot_add | ( | Boot * | self, |
int | chr, | ||
long | snpndx, | ||
int | pat, | ||
double | z | ||
) |
Add one site pattern contribution to a Boot structure.
[in,out] | self | The Boot structure to modify. |
[in] | chr | The index of the chromosome to modify. |
[in] | snpndx | The index of the current snp. |
[in] | pat | The index of the current site pattern. |
[in] | z | the contribution of the snp to the site pattern. |
References Boot_multiplicity(), Boot::count, Boot::cum, and Boot::nrep.
void Boot_aggregate | ( | Boot * | self, |
int | rep, | ||
int | npat, | ||
double | count[npat] | ||
) |
Add to an array the site pattern counts from a bootstrap replicate.
[in] | self | Points to a Boot object. |
[in] | the | index of the bootstrap replicate |
[in] | npat | the number of site patterns |
[out] | count | An array of doubles. The function will add to count[i] the contribution of site pattern i in bootstrap replicate rep. |
References Boot::npat.
void confidenceBounds | ( | double * | lowBnd, |
double * | highBnd, | ||
double | confidence, | ||
long | len, | ||
double | v[len] | ||
) |
Calculate confidence bounds from a vector of values representing samples drawn from the sampling distribution of some estimator.
To calculate the lower bound (*lowBnd), the function calculates the total probability mass in the tails (1 - confidence) and divides this into two equal parts to find p, the probability mass in each tail. It then estimates a value L such that a fraction p of the data values are less than or equal to L. To find this value, the function uses linear interpolation between the sorted list of data values.
The upper bound (*highBnd) is calculated in an analogous fashion.
[out] | lowBnd,highBnd | Calculated results will be written into these memory locations. |
[in] | confidence | Fraction of sampling distribution that lies inside the confidence bounds. |
[in] | len | The number of values inf v. |
[in] | v | The vector of values. |
double interpolate | ( | double | p, |
double * | v, | ||
long | len | ||
) |
Interpolate in order to approximate the value v[p*(len-1)].
Return NaN if len==0.