hierarch.stats.hypothesis_test
- hierarch.stats.hypothesis_test(data_array, treatment_col, compare='corr', alternative='two-sided', skip=None, bootstraps=100, permutations=1000, kind='weights', return_null=False, random_state=None)
Two-tailed hierarchical permutation test for change in location with any number of samples.
Equivalent to calculating a p-value for a slope coefficient in a linear model.
- Parameters:
- data_array2D numpy array or pandas DataFrame
Array-like containing both the independent and dependent variables to be analyzed. It’s assumed that the final (rightmost) column contains the dependent variable values.
- treatment_colint or str
The index number of the column containing “two samples” to be compared. Indexing starts at 0. If input data is a pandas DataFrame, this can be the name of the column.
- comparestr, optional
The test statistic to use to perform the hypothesis test, by default “corr” which automatically calls the studentized covariance test statistic.
- alternative{“two-sided”, “less”, “greater”}
The alternative hypothesis for the test, “two-sided” by default.
- skiplist of ints, optional
Columns to skip in the bootstrap. Skip columns that were sampled without replacement from the prior column, by default None
- bootstrapsint, optional
Number of bootstraps to perform, by default 100. Can be set to 1 for a permutation test without any bootstrapping.
- permutationsint or “all”, optional
Number of permutations to perform PER bootstrap sample. “all” for exact test (only works if there are only two treatments), by default 1000
- kindstr, optional
Bootstrap algorithm - see Bootstrapper class, by default “weights”
- return_nullbool, optional
Return the null distribution as well as the p value, by default False
- random_stateint or numpy random Generator, optional
Seedable for reproducibility, by default None
- Returns:
- float64
p-value for the hypothesis test
- list
Empirical null distribution used to calculate the p-value
- Raises:
- TypeError
Raised if input data is not ndarray or DataFrame.
- KeyError
If comparison is a string, it must be in the TEST_STATISTICS dictionary.
- AttributeError
If comparison is a custom statistic, it must be a function.
Examples
Specify the parameters of a dataset with a difference of means of 2.
>>> from hierarch.power import DataSimulator >>> import scipy.stats as stats >>> paramlist = [[0, 2], [stats.norm], [stats.norm]] >>> hierarchy = [2, 4, 3] >>> datagen = DataSimulator(paramlist, random_state=2) >>> datagen.fit(hierarchy) >>> data = datagen.generate() >>> print(data.shape) (24, 4)
>>> hypothesis_test(data, treatment_col=0, ... bootstraps=1000, permutations='all', ... random_state=1) 0.013714285714285714
By setting compare to “means”, this function will perform a permutation t-test. “corr”, which is based on a studentized covariance test statistic, should give the same or a very similar p-value to the permutation t-test for datasets with two treatment groups.
>>> hypothesis_test(data, treatment_col=0, compare='means', ... bootstraps=1000, permutations='all', ... random_state=1) 0.013714285714285714
This test can handle data with multiple treatment groups that have a hypothesized linear relationship.
>>> paramlist = [[0, 2/3, 4/3, 2], [stats.norm], [stats.norm]] >>> hierarchy = [4, 2, 3] >>> datagen = DataSimulator(paramlist, random_state=2) >>> datagen.fit(hierarchy) >>> data = datagen.generate() >>> print(data.shape) (24, 4)
There are 2,520 possible permutations, so choose a subset.
>>> hypothesis_test(data, treatment_col=0, ... bootstraps=100, permutations=1000, ... random_state=1) 0.0067