hierarch.stats.confidence_interval

hierarch.stats.confidence_interval(data_array, treatment_col, interval=95.0, iterations=7, tolerance=1, compare='corr', skip=None, bootstraps=50, permutations=100, kind='bayesian', random_state=None)

Compute a confidence inverval via test inversion.

Confidence interval can be calculated by inverting the acceptance region of a hypothesis test. Using a test statistic that is approximately normally distributed under the null makes this much easier.

Parameters:

data_array2D numpy array or pandas DataFrame: Array-like containing both the independent and dependent variables to be analyzed. It’s assumed that the final (rightmost) column contains the dependent variable values.
treatment_colint or str: The index number of the column containing “two samples” to be compared. Indexing starts at 0. If input data is a pandas DataFrame, this can be the column name.
intervalfloat, optional: Percentage value indicating the confidence interval’s coverage, by default 95
iterationsint, optional: Maximum number of times the interval will be refined, by default 7
tolerancefloat, optional: If the delta between the current bounds and the target interval is less than this value, refinement will stop. Setting this number too close to the Monte Carlo error of the underlying hypothesis test will have a negative effect on coverage.
comparestr, optional: The test statistic to use to perform the hypothesis test, by default “corr” which automatically calls the studentized covariance test statistic.
skiplist of ints, optional: Columns to skip in the bootstrap. Skip columns that were sampled without replacement from the prior column, by default None
bootstrapsint, optional: Number of bootstraps to perform, by default 100. Can be set to 1 for a permutation test without any bootstrapping.
permutationsint or “all”, optional: Number of permutations to perform PER bootstrap sample. “all” for exact test (only works if there are only two treatments), by default 1000
kindstr, optional: Bootstrap algorithm - see Bootstrapper class, by default “bayesian”
random_stateint or numpy random Generator, optional: Seedable for reproducibility., by default None

Returns:

tuple of floats: Confidence interval spanning the specified interval.

Notes

While the Efron bootstrap is the default in most of hierarch’s statistical functions, using the Bayesian bootstrap here helps get tighter confidence intervals with the correct coverage without having to massively increase the number of resamples.

The inversion procedure performed by this function is described in detail in “Randomization, Bootstrap and Monte Carlo Methods in Biology” by Bryan FJ Manly. https://doi.org/10.1201/9781315273075.

Examples

Specify the parameters of a dataset with a difference of means of 2.

>>> from hierarch.power import DataSimulator
>>> import scipy.stats as stats
>>> paramlist = [[0, 2], [stats.norm], [stats.norm]]
>>> hierarchy = [2, 4, 3]
>>> datagen = DataSimulator(paramlist, random_state=2)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()
>>> print(data.shape)
(24, 4)

>>> confidence_interval(data, treatment_col=0, interval=95,
...    bootstraps=1000, permutations='all', random_state=1)
(1.314807450602109, 6.124658302189696)

The true difference is 2, which falls within the interval. We can examine the p-value for the corresponding dataset:

>>> from hierarch.stats import hypothesis_test
>>> hypothesis_test(data, treatment_col=0, compare='corr',
...                 bootstraps=1000, permutations='all',
...                 random_state=1)
0.013714285714285714

This suggests that while the 95% confidence interval does not contain 0, the 99.5% confidence interval should.

>>> confidence_interval(data, treatment_col=0, interval=99.5,
...    bootstraps=1000, permutations='all', random_state=1)
(-0.12320618535452432, 7.56267193814634)

A permutation t-test can be used to generate the null distribution by specifying compare = “means”. This should return the same or a very similar interval.

>>> confidence_interval(data, treatment_col=0, interval=95,
...    compare='means', bootstraps=1000,
...    permutations='all', random_state=1)
(1.314807450602109, 6.124658302189696)

Setting compare = “corr” will generate a confidence interval for the slope in a regression equation.

>>> paramlist = [[0, 1, 2, 3], [stats.norm], [stats.norm]]
>>> hierarchy = [4, 4, 3]
>>> datagen = DataSimulator(paramlist, random_state=2)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()

>>> confidence_interval(data, treatment_col=0, interval=95,
...                 compare='corr', bootstraps=100,
...                 permutations=1000, random_state=1)
(0.8317584051133191, 1.6192897830814992)

The dataset was specified to have a true slope of 1, which is within the interval.