palma.utils.utils#

Classes#

AverageEstimator

A simple ensemble estimator that computes the average prediction of a list of estimators.

Functions#

`_clone`(estimator)	Create and return a clone of the input estimator.
`get_splitting_matrix`(→ pandas.DataFrame)	Generate a splitting matrix based on cross-validation iterations.
`check_splitting_strategy`(X, iter_cross_validation)
`hash_dataframe`(data[, how])
`get_hash`(→ str)	Return a hash of parameters
`get_estimator_name`(→ str)
`check_started`(→ Callable)	check_built is a decorator used for methods that must be called on built or unbuilt `Project`.
`interpolate_roc`(roc_curve_metric[, mean_fpr])
`_get_processing_pipeline`(estimators)
`_get_and_check_var_importance`(estimator)

Module Contents#

class palma.utils.utils.AverageEstimator(estimator_list: list)#

A simple ensemble estimator that computes the average prediction of a list of estimators.

Parameters:

estimator_listlist: A list of individual estimators to be averaged.

Attributes:

estimator_listlist: The list of individual estimators.
nint: The number of estimators in the list.

Methods

*predict(args, kwargs)	Compute the average prediction across all estimators.
*predict_proba(args, kwargs)	Compute the average class probabilities across all estimators.

Returns:

numpy.ndarray: The averaged prediction or class probabilities.

estimator_list#

n#

predict(*args, **kwargs) → iter#

predict_proba(*args, **kwargs) → iter#

palma.utils.utils._clone(estimator)#

Create and return a clone of the input estimator.

Parameters:

estimatorobject: The estimator object to be cloned.

Returns:

object: A cloned copy of the input estimator.

Notes

This function attempts to create a clone of the input estimator using the clone function. If the clone function is not available or raises a TypeError, it falls back to using deepcopy. If both methods fail, the original estimator is returned.

Examples

>>> from sklearn.linear_model import LinearRegression
>>> original_estimator = LinearRegression()
>>> cloned_estimator = _clone(original_estimator)

palma.utils.utils.get_splitting_matrix(X: pandas.DataFrame, iter_cross_validation: iter, expand=False) → pandas.DataFrame#

Generate a splitting matrix based on cross-validation iterations.

Parameters:

Xpd.DataFrame: The input dataframe.
iter_cross_validationIterable: An iterable containing cross-validation splits (train, test).
expandbool, optional: If True, the output matrix will have columns for both train and test splits for each iteration. If False (default), the output matrix will have columns for each iteration with 1 for train and 2 for test.

Returns:

pd.DataFrame: A matrix indicating the train (1) and test (2) splits for each iteration. Rows represent data points, and columns represent iterations.

Examples

>>> import pandas as pd
>>> X = pd.DataFrame({'feature1': [1, 2, 3, 4, 5],
...                   'feature2': ['A', 'B', 'C', 'D', 'E']})
>>> iter_cv = [(range(3), range(3, 5)), (range(2), range(2, 5))]
>>> get_splitting_matrix(X, iter_cv)

palma.utils.utils.check_splitting_strategy(X: pandas.DataFrame, iter_cross_validation: iter)#

palma.utils.utils.hash_dataframe(data: pandas.DataFrame, how='whole')#

palma.utils.utils.get_hash(**kwargs) → str#: Return a hash of parameters

palma.utils.utils.get_estimator_name(estimator) → str#

palma.utils.utils.check_started(message: str, need_build: bool = False) → Callable#

check_built is a decorator used for methods that must be called on built or unbuilt Project. If the Project is_built attribute has not the correct value, an AttributeError is raised with the message passed as argument.

Parameters:

message: str: Error message
need_build: bool: Expected value for Project is_built attribute

Returns:

Callable

palma.utils.utils.interpolate_roc(roc_curve_metric: dict[dict[tuple[dict[numpy.array]]]], mean_fpr=np.linspace(0, 1, 100))#

palma.utils.utils._get_processing_pipeline(estimators: list)#

palma.utils.utils._get_and_check_var_importance(estimator)#