hierarch.stats.confidence_interval
- hierarch.stats.confidence_interval(data_array, treatment_col, interval=95.0, iterations=7, tolerance=1, compare='corr', skip=None, bootstraps=50, permutations=100, kind='bayesian', random_state=None)
Compute a confidence inverval via test inversion.
Confidence interval can be calculated by inverting the acceptance region of a hypothesis test. Using a test statistic that is approximately normally distributed under the null makes this much easier.
- Parameters:
- data_array2D numpy array or pandas DataFrame
Array-like containing both the independent and dependent variables to be analyzed. It’s assumed that the final (rightmost) column contains the dependent variable values.
- treatment_colint or str
The index number of the column containing “two samples” to be compared. Indexing starts at 0. If input data is a pandas DataFrame, this can be the column name.
- intervalfloat, optional
Percentage value indicating the confidence interval’s coverage, by default 95
- iterationsint, optional
Maximum number of times the interval will be refined, by default 7
- tolerancefloat, optional
If the delta between the current bounds and the target interval is less than this value, refinement will stop. Setting this number too close to the Monte Carlo error of the underlying hypothesis test will have a negative effect on coverage.
- comparestr, optional
The test statistic to use to perform the hypothesis test, by default “corr” which automatically calls the studentized covariance test statistic.
- skiplist of ints, optional
Columns to skip in the bootstrap. Skip columns that were sampled without replacement from the prior column, by default None
- bootstrapsint, optional
Number of bootstraps to perform, by default 100. Can be set to 1 for a permutation test without any bootstrapping.
- permutationsint or “all”, optional
Number of permutations to perform PER bootstrap sample. “all” for exact test (only works if there are only two treatments), by default 1000
- kindstr, optional
Bootstrap algorithm - see Bootstrapper class, by default “bayesian”
- random_stateint or numpy random Generator, optional
Seedable for reproducibility., by default None
- Returns:
- tuple of floats
Confidence interval spanning the specified interval.
Notes
While the Efron bootstrap is the default in most of hierarch’s statistical functions, using the Bayesian bootstrap here helps get tighter confidence intervals with the correct coverage without having to massively increase the number of resamples.
The inversion procedure performed by this function is described in detail in “Randomization, Bootstrap and Monte Carlo Methods in Biology” by Bryan FJ Manly. https://doi.org/10.1201/9781315273075.
Examples
Specify the parameters of a dataset with a difference of means of 2.
>>> from hierarch.power import DataSimulator >>> import scipy.stats as stats >>> paramlist = [[0, 2], [stats.norm], [stats.norm]] >>> hierarchy = [2, 4, 3] >>> datagen = DataSimulator(paramlist, random_state=2) >>> datagen.fit(hierarchy) >>> data = datagen.generate() >>> print(data.shape) (24, 4)
>>> confidence_interval(data, treatment_col=0, interval=95, ... bootstraps=1000, permutations='all', random_state=1) (1.314807450602109, 6.124658302189696)
The true difference is 2, which falls within the interval. We can examine the p-value for the corresponding dataset:
>>> from hierarch.stats import hypothesis_test >>> hypothesis_test(data, treatment_col=0, compare='corr', ... bootstraps=1000, permutations='all', ... random_state=1) 0.013714285714285714
This suggests that while the 95% confidence interval does not contain 0, the 99.5% confidence interval should.
>>> confidence_interval(data, treatment_col=0, interval=99.5, ... bootstraps=1000, permutations='all', random_state=1) (-0.12320618535452432, 7.56267193814634)
A permutation t-test can be used to generate the null distribution by specifying compare = “means”. This should return the same or a very similar interval.
>>> confidence_interval(data, treatment_col=0, interval=95, ... compare='means', bootstraps=1000, ... permutations='all', random_state=1) (1.314807450602109, 6.124658302189696)
Setting compare = “corr” will generate a confidence interval for the slope in a regression equation.
>>> paramlist = [[0, 1, 2, 3], [stats.norm], [stats.norm]] >>> hierarchy = [4, 4, 3] >>> datagen = DataSimulator(paramlist, random_state=2) >>> datagen.fit(hierarchy) >>> data = datagen.generate()
>>> confidence_interval(data, treatment_col=0, interval=95, ... compare='corr', bootstraps=100, ... permutations=1000, random_state=1) (0.8317584051133191, 1.6192897830814992)
The dataset was specified to have a true slope of 1, which is within the interval.