hierarch.stats.hypothesis_test

hierarch.stats.hypothesis_test(data_array, treatment_col, compare='corr', alternative='two-sided', skip=None, bootstraps=100, permutations=1000, kind='weights', return_null=False, random_state=None)

Two-tailed hierarchical permutation test for change in location with any number of samples.

Equivalent to calculating a p-value for a slope coefficient in a linear model.

Parameters:
data_array2D numpy array or pandas DataFrame

Array-like containing both the independent and dependent variables to be analyzed. It’s assumed that the final (rightmost) column contains the dependent variable values.

treatment_colint or str

The index number of the column containing “two samples” to be compared. Indexing starts at 0. If input data is a pandas DataFrame, this can be the name of the column.

comparestr, optional

The test statistic to use to perform the hypothesis test, by default “corr” which automatically calls the studentized covariance test statistic.

alternative{“two-sided”, “less”, “greater”}

The alternative hypothesis for the test, “two-sided” by default.

skiplist of ints, optional

Columns to skip in the bootstrap. Skip columns that were sampled without replacement from the prior column, by default None

bootstrapsint, optional

Number of bootstraps to perform, by default 100. Can be set to 1 for a permutation test without any bootstrapping.

permutationsint or “all”, optional

Number of permutations to perform PER bootstrap sample. “all” for exact test (only works if there are only two treatments), by default 1000

kindstr, optional

Bootstrap algorithm - see Bootstrapper class, by default “weights”

return_nullbool, optional

Return the null distribution as well as the p value, by default False

random_stateint or numpy random Generator, optional

Seedable for reproducibility, by default None

Returns:
float64

p-value for the hypothesis test

list

Empirical null distribution used to calculate the p-value

Raises:
TypeError

Raised if input data is not ndarray or DataFrame.

KeyError

If comparison is a string, it must be in the TEST_STATISTICS dictionary.

AttributeError

If comparison is a custom statistic, it must be a function.

Examples

Specify the parameters of a dataset with a difference of means of 2.

>>> from hierarch.power import DataSimulator
>>> import scipy.stats as stats
>>> paramlist = [[0, 2], [stats.norm], [stats.norm]]
>>> hierarchy = [2, 4, 3]
>>> datagen = DataSimulator(paramlist, random_state=2)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()
>>> print(data.shape)
(24, 4)
>>> hypothesis_test(data, treatment_col=0,
...                 bootstraps=1000, permutations='all',
...                 random_state=1)
0.013714285714285714

By setting compare to “means”, this function will perform a permutation t-test. “corr”, which is based on a studentized covariance test statistic, should give the same or a very similar p-value to the permutation t-test for datasets with two treatment groups.

>>> hypothesis_test(data, treatment_col=0, compare='means',
...                 bootstraps=1000, permutations='all',
...                 random_state=1)
0.013714285714285714

This test can handle data with multiple treatment groups that have a hypothesized linear relationship.

>>> paramlist = [[0, 2/3, 4/3, 2], [stats.norm], [stats.norm]]
>>> hierarchy = [4, 2, 3]
>>> datagen = DataSimulator(paramlist, random_state=2)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()
>>> print(data.shape)
(24, 4)

There are 2,520 possible permutations, so choose a subset.

>>> hypothesis_test(data, treatment_col=0,
...                 bootstraps=100, permutations=1000,
...                 random_state=1)
0.0067