hierarch.stats.two_sample_test¶

hierarch.stats.two_sample_test(data_array, treatment_col: int, compare='means', skip=None, bootstraps=100, permutations=1000, kind='weights', return_null=False, random_state=None)¶

Two-tailed two-sample hierarchical permutation test.

Parameters

data_array2D numpy array or pandas DataFrame: Array-like containing both the independent and dependent variables to be analyzed. It’s assumed that the final (rightmost) column contains the dependent variable values.
treatment_colint: The index number of the column containing “two samples” to be compared. Indexing starts at 0.
comparestr, optional: The test statistic to use to perform the hypothesis test. “means” automatically calls the Welch t-statistic for a difference of means test, by default “means”
skiplist of ints, optional: Columns to skip in the bootstrap. Skip columns that were sampled without replacement from the prior column, by default None
bootstrapsint, optional: Number of bootstraps to perform, by default 100. Can be set to 1 for a permutation test without any bootstrapping.
permutationsint or “all”, optional: Number of permutations to perform PER bootstrap sample. “all” for exact test, by default 1000
kindstr, optional: Bootstrap algorithm - see Bootstrapper class, by default “weights”
return_nullbool, optional: Return the null distribution as well as the p value, by default False
seedint or numpy random Generator, optional: Seedable for reproducibility., by default None

Returns

float64: p-value for the hypothesis test
list: Empirical null distribution used to calculate the p-value

Raises

TypeError: Raised if input data is not ndarray or DataFrame.
ValueError: Raised if treatment_col has more than two different labels in it.
KeyError: If comparison is a string, it must be in the TEST_STATISTICS dictionary.
AttributeError: If comparison is a custom statistic, it must be a function.

Examples

Specify the parameters of a dataset with a difference of means of 2.

>>> from hierarch.power import DataSimulator
>>> import scipy.stats as stats
>>> paramlist = [[0, 2], [stats.norm], [stats.norm]]
>>> hierarchy = [2, 4, 3]
>>> datagen = DataSimulator(paramlist, random_state=123)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()
>>> print(data.shape)
(24, 4)

>>> two_sample_test(data, treatment_col=0,
...                 bootstraps=1000, permutations='all',
...                 random_state=1)
0.03402857142857143

Instead of an exact test, a number of random permutations can be specified. In this case there are 70 possible permutations.

>>> two_sample_test(data, treatment_col=0,
...                 bootstraps=1000, permutations=70,
...                 random_state=1)
0.03357142857142857

The treatment column does not have to be the outermost column.

>>> paramlist = [[stats.norm], [0, 1]*3, [stats.norm], [stats.norm]]
>>> hierarchy = [3, 2, 4, 3]
>>> datagen = DataSimulator(paramlist, random_state=123)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()
>>> print(data.shape)
(72, 5)

Because of the larger number of possible permutations, it is usually better to reduce the number of bootstraps and increase the number of permutations.

>>> two_sample_test(data, treatment_col=0,
...                 bootstraps=100, permutations=1000,
...                 random_state=1)
Traceback (most recent call last):
    ...
ValueError: Needs 2 samples.

Make sure that treatment_col is set to right column index.

>>> two_sample_test(data, treatment_col=1,
...                 bootstraps=100, permutations=1000,
...                 random_state=1)
0.00276