Functions for a moving blocks bootstrap. More...

#include "boot.h"
#include "misc.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <assert.h>
#include <gsl/gsl_rng.h>

Data Structures
struct	Boot
	Contains the all data involved in a moving blocks bootstrap. More...

Functions
double	interpolate (double p, double *v, long len)
	Interpolate in order to approximate the value v[p*(len-1)]. More...

long	LInt_div_round (long num, long denom)
	Divide num by denom and round the result to the nearest integer.

long	Boot_multiplicity (const Boot *self, long snpndx, long rep)
	How many copies of snp with index snpndx are present in a given repetition (rep)?

Boot *	Boot_new (int nchr, long nsnpvec[nchr], long nrep, int npat, long blocksize, gsl_rng *rng)
	Constructor for class Boot.

void	Boot_sanityCheck (const Boot self, const char file, int line)

void	Boot_add (Boot *self, int chr, long snpndx, int pat, double z)
	Add one site pattern contribution to a Boot structure. More...

void	Boot_free (Boot *self)
	Destructor.

void	Boot_aggregate (Boot *self, int rep, int npat, double count[npat])
	Add to an array the site pattern counts from a bootstrap replicate. More...

void	confidenceBounds (double lowBnd, double highBnd, double confidence, long len, double v[len])
	Calculate confidence bounds from a vector of values representing samples drawn from the sampling distribution of some estimator. More...

void	Boot_print (const Boot self, FILE ofp)
	Print a Boot object.

Detailed Description

Functions for a moving blocks bootstrap.

Author: Alan R. Rogers

Copyright: Copyright (c) 2016, Alan R. Rogers roger.nosp@m.s@an.nosp@m.thro..nosp@m.utah.nosp@m..edu. This file is released under the Internet Systems Consortium License, which can be found in file "LICENSE".

This bootstrap treats all nucleotides as a single array, ignoring the distinction between chromosomes. Thus, blocks may span chromosomes. This should not cause problems with high-quality genomes, in which chromosomes are known. When a block spans two chromosomes, the sites within the block are less correlated within that block, but this doesn't violate any assumption.

I'm more worried about genomes that consist of many small contigs. With such genomes, nucleotides in different contigs may be tightly linked, and these linked contigs may end up in different blocks. This violates the assumption (of the moving-blocks bootstrap) that observations in different blocks are only weakly correlated. I'm not sure what effect this will have, but I fear it will make the confidence intervals too narrow.

Function Documentation

◆ Boot_add()

void Boot_add	(	Boot *	self,
		int	chr,
		long	snpndx,
		int	pat,
		double	z
	)

Add one site pattern contribution to a Boot structure.

Parameters

[in,out]	self	The Boot structure to modify.
[in]	chr	The index of the chromosome to modify.
[in]	snpndx	The index of the current snp.
[in]	pat	The index of the current site pattern.
[in]	z	the contribution of the snp to the site pattern.

References Boot_multiplicity(), Boot::count, Boot::cum, and Boot::nrep.

◆ Boot_aggregate()

void Boot_aggregate	(	Boot *	self,
		int	rep,
		int	npat,
		double	count[npat]
	)

Add to an array the site pattern counts from a bootstrap replicate.

Parameters

[in]	self	Points to a Boot object.
[in]	the	index of the bootstrap replicate
[in]	npat	the number of site patterns
[out]	count	An array of doubles. The function will add to count[i] the contribution of site pattern i in bootstrap replicate rep.

References Boot::npat.

◆ confidenceBounds()

void confidenceBounds	(	double *	lowBnd,
		double *	highBnd,
		double	confidence,
		long	len,
		double	v[len]
	)

Calculate confidence bounds from a vector of values representing samples drawn from the sampling distribution of some estimator.

To calculate the lower bound (*lowBnd), the function calculates the total probability mass in the tails (1 - confidence) and divides this into two equal parts to find p, the probability mass in each tail. It then estimates a value L such that a fraction p of the data values are less than or equal to L. To find this value, the function uses linear interpolation between the sorted list of data values.

The upper bound (*highBnd) is calculated in an analogous fashion.

Parameters

[out]	lowBnd,highBnd	Calculated results will be written into these memory locations.
[in]	confidence	Fraction of sampling distribution that lies inside the confidence bounds.
[in]	len	The number of values inf v.
[in]	v	The vector of values.

Side Effects:\n Sorts the vector v.

◆ interpolate()

double interpolate	(	double	p,
		double *	v,
		long	len
	)

Interpolate in order to approximate the value v[p*(len-1)].

Return NaN if len==0.

Data Structures

Functions

Detailed Description

Function Documentation

◆ Boot_add()

◆ Boot_aggregate()

◆ confidenceBounds()

◆ interpolate()