hierarch.stats.two_sample_test¶
- hierarch.stats.two_sample_test(data_array, treatment_col: int, compare='means', skip=None, bootstraps=100, permutations=1000, kind='weights', return_null=False, random_state=None)¶
Two-tailed two-sample hierarchical permutation test.
- Parameters
- data_array2D numpy array or pandas DataFrame
Array-like containing both the independent and dependent variables to be analyzed. It’s assumed that the final (rightmost) column contains the dependent variable values.
- treatment_colint
The index number of the column containing “two samples” to be compared. Indexing starts at 0.
- comparestr, optional
The test statistic to use to perform the hypothesis test. “means” automatically calls the Welch t-statistic for a difference of means test, by default “means”
- skiplist of ints, optional
Columns to skip in the bootstrap. Skip columns that were sampled without replacement from the prior column, by default None
- bootstrapsint, optional
Number of bootstraps to perform, by default 100. Can be set to 1 for a permutation test without any bootstrapping.
- permutationsint or “all”, optional
Number of permutations to perform PER bootstrap sample. “all” for exact test, by default 1000
- kindstr, optional
Bootstrap algorithm - see Bootstrapper class, by default “weights”
- return_nullbool, optional
Return the null distribution as well as the p value, by default False
- seedint or numpy random Generator, optional
Seedable for reproducibility., by default None
- Returns
- float64
p-value for the hypothesis test
- list
Empirical null distribution used to calculate the p-value
- Raises
- TypeError
Raised if input data is not ndarray or DataFrame.
- ValueError
Raised if treatment_col has more than two different labels in it.
- KeyError
If comparison is a string, it must be in the TEST_STATISTICS dictionary.
- AttributeError
If comparison is a custom statistic, it must be a function.
Examples
Specify the parameters of a dataset with a difference of means of 2.
>>> from hierarch.power import DataSimulator >>> import scipy.stats as stats >>> paramlist = [[0, 2], [stats.norm], [stats.norm]] >>> hierarchy = [2, 4, 3] >>> datagen = DataSimulator(paramlist, random_state=123) >>> datagen.fit(hierarchy) >>> data = datagen.generate() >>> print(data.shape) (24, 4)
>>> two_sample_test(data, treatment_col=0, ... bootstraps=1000, permutations='all', ... random_state=1) 0.03402857142857143
Instead of an exact test, a number of random permutations can be specified. In this case there are 70 possible permutations.
>>> two_sample_test(data, treatment_col=0, ... bootstraps=1000, permutations=70, ... random_state=1) 0.03357142857142857
The treatment column does not have to be the outermost column.
>>> paramlist = [[stats.norm], [0, 1]*3, [stats.norm], [stats.norm]] >>> hierarchy = [3, 2, 4, 3] >>> datagen = DataSimulator(paramlist, random_state=123) >>> datagen.fit(hierarchy) >>> data = datagen.generate() >>> print(data.shape) (72, 5)
Because of the larger number of possible permutations, it is usually better to reduce the number of bootstraps and increase the number of permutations.
>>> two_sample_test(data, treatment_col=0, ... bootstraps=100, permutations=1000, ... random_state=1) Traceback (most recent call last): ... ValueError: Needs 2 samples.
Make sure that treatment_col is set to right column index.
>>> two_sample_test(data, treatment_col=1, ... bootstraps=100, permutations=1000, ... random_state=1) 0.00276