hierarch.resampling.Permuter

class hierarch.resampling.Permuter(random_state: Generator | int | None = None)

Bases: object

Class for performing cluster-aware permutation on a target column.

Parameters:

random_stateint or numpy.random.Generator instance, optional: Seedable for reproducibility, by default None

Examples

When the column to resample is the first column, Permuter performs an ordinary shuffle.

>>> from hierarch.power import DataSimulator
>>> from hierarch.internal_functions import GroupbyMean
>>> paramlist = [[1]*2, [0]*6, [0]*18]
>>> hierarchy = [2, 3, 3]
>>> datagen = DataSimulator(paramlist)
>>> datagen.fit(hierarchy)
>>> data = datagen.generate()
>>> agg = GroupbyMean()
>>> test = agg.fit_transform(data)
>>> test
array([[1., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.],
       [2., 1., 1.],
       [2., 2., 1.],
       [2., 3., 1.]])

Permuter performs an in-place shuffle on the fitted data.

>>> permute = Permuter(random_state=1)
>>> permute.fit(test, col_to_permute=0, exact=False)
>>> permute.transform(test)
array([[2., 1., 1.],
       [2., 2., 1.],
       [1., 3., 1.],
       [2., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.]])

If exact=True, Permuter will not repeat a permutation until all possible permutations have been exhausted.

>>> test = agg.fit_transform(data)
>>> permute = Permuter(random_state=1)
>>> permute.fit(test, col_to_permute=0, exact=True)
>>> permute.transform(test)
array([[2., 1., 1.],
       [2., 2., 1.],
       [2., 3., 1.],
       [1., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.]])
>>> next(permute.iterator)
[1.0, 2.0, 2.0, 2.0, 1.0, 1.0]
>>> next(permute.iterator)
[2.0, 1.0, 2.0, 2.0, 1.0, 1.0]

If the column to permute is not 0, Permuter performs a within-cluster shuffle. Note that values of column 1 were shuffled within their column 0 cluster.

>>> test = agg.fit_transform(data)
>>> permute = Permuter(random_state=2)
>>> permute.fit(test, col_to_permute=1, exact=False)
>>> permute.transform(test)
array([[1., 1., 1.],
       [1., 2., 1.],
       [1., 3., 1.],
       [2., 2., 1.],
       [2., 1., 1.],
       [2., 3., 1.]])

Exact within-cluster permutations are not implemented, but there are typically too many to be worth attempting.

>>> permute = Permuter(random_state=2)
>>> permute.fit(test, col_to_permute=1, exact=True)
Traceback (most recent call last):
    ...
NotImplementedError: Exact permutation only available for col_to_permute = 0.

Methods

`fit`(data, col_to_permute[, exact])	Fit the permuter to the target data.
`transform`(data)	Permute target column in-place.

fit(data: ndarray, col_to_permute: int, exact: bool = False) → None

Fit the permuter to the target data.

Parameters:

data2D numeric ndarray: Target data.
col_to_permuteint: Index of target column.
exactbool, optional: If True, will enumerate all possible permutations and iterate through them one by one, by default False. Only works if target column has index 0.

transform(data: ndarray) → ndarray

Permute target column in-place.

Parameters:

data2D numeric ndarray: Target data.

Returns:

data2D numeric ndarray: Original data with target column shuffled, in a stratified fashion if necessary.