hierarch.stats.welch_statistic

hierarch.stats.welch_statistic(data, col: int, treatment_labels)

Calculates Welch’s t statistic.

Takes a 2D data matrix, a column to classify data by, and the labels corresponding to the data of interest. Assumes that the largest (-1) column in the data matrix is the dependent variable.

Parameters
data2D array

Data matrix. Assumes last column contains dependent variable values.

colint

Target column to be used to divide the dependent variable into two groups.

treatment_labels1D array-like

Labels in target column to be used.

Returns
float64

Welch’s t statistic.

Notes

Details on the validity of this test statistic can be found in “Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens-Fisher problem” by Arnold Janssen. https://doi.org/10.1016/S0167-7152(97)00043-6.

Examples

>>> x = np.array([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1], 
...               [1, 2, 3, 4, 5, 10, 11, 12, 13, 14]])
>>> x.T
array([[ 0,  1],
       [ 0,  2],
       [ 0,  3],
       [ 0,  4],
       [ 0,  5],
       [ 1, 10],
       [ 1, 11],
       [ 1, 12],
       [ 1, 13],
       [ 1, 14]])
>>> welch_statistic(x.T, 0, (0, 1))
-9.0

This uses the same calculation as scipy’s ttest function.

>>> import scipy.stats as stats
>>> a = [1, 2, 3, 4, 5]
>>> b = [10, 11, 12, 13, 14]
>>> stats.ttest_ind(a, b, equal_var=False)[0]
-9.0