palma.utils.utils#

Module Contents#

Classes#

AverageEstimator

A simple ensemble estimator that computes the average prediction of a list of estimators.

Functions#

_clone(estimator)

Create and return a clone of the input estimator.

get_splitting_matrix(→ pandas.DataFrame)

Generate a splitting matrix based on cross-validation iterations.

check_splitting_strategy(X, iter_cross_validation)

hash_dataframe(data[, how])

get_hash(→ str)

Return a hash of parameters

get_estimator_name(→ str)

check_started(→ Callable)

check_built is a decorator used for methods that must be called on built or unbuilt Project.

interpolate_roc(roc_curve_metric[, mean_fpr])

_get_processing_pipeline(estimators)

_get_and_check_var_importance(estimator)

class palma.utils.utils.AverageEstimator(estimator_list: list)#

A simple ensemble estimator that computes the average prediction of a list of estimators.

Parameters:
estimator_listlist

A list of individual estimators to be averaged.

Returns:
numpy.ndarray

The averaged prediction or class probabilities.

Attributes:
estimator_listlist

The list of individual estimators.

nint

The number of estimators in the list.

Methods

predict(*args, **kwargs)

Compute the average prediction across all estimators.

predict_proba(*args, **kwargs)

Compute the average class probabilities across all estimators.

predict(*args, **kwargs) iter#
predict_proba(*args, **kwargs) iter#
palma.utils.utils._clone(estimator)#

Create and return a clone of the input estimator.

Parameters:
estimatorobject

The estimator object to be cloned.

Returns:
object

A cloned copy of the input estimator.

Notes

This function attempts to create a clone of the input estimator using the clone function. If the clone function is not available or raises a TypeError, it falls back to using deepcopy. If both methods fail, the original estimator is returned.

Examples

>>> from sklearn.linear_model import LinearRegression
>>> original_estimator = LinearRegression()
>>> cloned_estimator = _clone(original_estimator)
palma.utils.utils.get_splitting_matrix(X: pandas.DataFrame, iter_cross_validation: iter, expand=False) pandas.DataFrame#

Generate a splitting matrix based on cross-validation iterations.

Parameters:
Xpd.DataFrame

The input dataframe.

iter_cross_validationIterable

An iterable containing cross-validation splits (train, test).

expandbool, optional

If True, the output matrix will have columns for both train and test splits for each iteration. If False (default), the output matrix will have columns for each iteration with 1 for train and 2 for test.

Returns:
pd.DataFrame

A matrix indicating the train (1) and test (2) splits for each iteration. Rows represent data points, and columns represent iterations.

Examples

>>> import pandas as pd
>>> X = pd.DataFrame({'feature1': [1, 2, 3, 4, 5],
...                   'feature2': ['A', 'B', 'C', 'D', 'E']})
>>> iter_cv = [(range(3), range(3, 5)), (range(2), range(2, 5))]
>>> get_splitting_matrix(X, iter_cv)
palma.utils.utils.check_splitting_strategy(X: pandas.DataFrame, iter_cross_validation: iter)#
palma.utils.utils.hash_dataframe(data: pandas.DataFrame, how='whole')#
palma.utils.utils.get_hash(**kwargs) str#

Return a hash of parameters

palma.utils.utils.get_estimator_name(estimator) str#
palma.utils.utils.check_started(message: str, need_build: bool = False) Callable#

check_built is a decorator used for methods that must be called on built or unbuilt Project. If the Project is_built attribute has not the correct value, an AttributeError is raised with the message passed as argument.

Parameters:
message: str

Error message

need_build: bool

Expected value for Project is_built attribute

Returns:
Callable
palma.utils.utils.interpolate_roc(roc_curve_metric: dict[dict[tuple[dict[numpy.array]]]], mean_fpr=np.linspace(0, 1, 100))#
palma.utils.utils._get_processing_pipeline(estimators: list)#
palma.utils.utils._get_and_check_var_importance(estimator)#