hierarch.resampling.Permuter

class hierarch.resampling.Permuter(random_state: Generator | int | None = None)

Bases: object

Class for performing cluster-aware permutation on a target column.

Parameters:
random_stateint or numpy.random.Generator instance, optional

Seedable for reproducibility, by default None

Examples

When the column to resample is the first column, Permuter performs an ordinary shuffle.

>>> from hierarch.power import DataSimulator
>>> from hierarch.internal_functions import GroupbyMean
>>> paramlist = [[1]*2, [0]*6, [0]*18]
>>> hierarchy = [2, 3, 3]
>>> datagen = DataSimulator(paramlist)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()
>>> agg = GroupbyMean()
>>> test = agg.fit_transform(data)
>>> test
array([[1., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.],
       [2., 1., 1.],
       [2., 2., 1.],
       [2., 3., 1.]])

Permuter performs an in-place shuffle on the fitted data.

>>> permute = Permuter(random_state=1)
>>> permute.fit(test, col_to_permute=0, exact=False)
>>> permute.transform(test)
array([[2., 1., 1.],
       [2., 2., 1.],
       [1., 3., 1.],
       [2., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.]])

If exact=True, Permuter will not repeat a permutation until all possible permutations have been exhausted.

>>> test = agg.fit_transform(data)
>>> permute = Permuter(random_state=1)
>>> permute.fit(test, col_to_permute=0, exact=True)
>>> permute.transform(test)
array([[2., 1., 1.],
       [2., 2., 1.],
       [2., 3., 1.],
       [1., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.]])
>>> next(permute.iterator)
[1.0, 2.0, 2.0, 2.0, 1.0, 1.0]
>>> next(permute.iterator)
[2.0, 1.0, 2.0, 2.0, 1.0, 1.0]

If the column to permute is not 0, Permuter performs a within-cluster shuffle. Note that values of column 1 were shuffled within their column 0 cluster.

>>> test = agg.fit_transform(data)
>>> permute = Permuter(random_state=2)
>>> permute.fit(test, col_to_permute=1, exact=False)
>>> permute.transform(test)
array([[1., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.],
       [2., 2., 1.],
       [2., 1., 1.],
       [2., 3., 1.]])

Exact within-cluster permutations are not implemented, but there are typically too many to be worth attempting.

>>> permute = Permuter(random_state=2)
>>> permute.fit(test, col_to_permute=1, exact=True)
Traceback (most recent call last):
    ...
NotImplementedError: Exact permutation only available for col_to_permute = 0.

Methods

fit(data, col_to_permute[, exact])

Fit the permuter to the target data.

transform(data)

Permute target column in-place.

fit(data: ndarray, col_to_permute: int, exact: bool = False) None

Fit the permuter to the target data.

Parameters:
data2D numeric ndarray

Target data.

col_to_permuteint

Index of target column.

exactbool, optional

If True, will enumerate all possible permutations and iterate through them one by one, by default False. Only works if target column has index 0.

transform(data: ndarray) ndarray

Permute target column in-place.

Parameters:
data2D numeric ndarray

Target data.

Returns:
data2D numeric ndarray

Original data with target column shuffled, in a stratified fashion if necessary.