palma.components.performance#

Module Contents#

Classes#

Analyser

Analyser class for performing analysis on a model.

ShapAnalysis

Analyser class for performing analysis on a model.

ScoringAnalysis

The ScoringAnalyser class provides methods for analyzing the performance of

RegressionAnalysis

Analyser class for performing analysis on a regression model.

PermutationFeatureImportance

Class for doing permutation feature importance

Attributes#

__fpr_sampling__

palma.components.performance.__fpr_sampling__#
class palma.components.performance.Analyser(on)#

Bases: palma.components.base.ModelComponent

Analyser class for performing analysis on a model.

Parameters:
onstr

The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.

property metrics#
__call__(project: Project, model: ModelEvaluation)#
_add(project, model)#
variable_importance()#

Compute the feature importance for each estimator.

Returns:
feature_importancepandas.DataFrame

DataFrame containing the feature importance values for each estimator.

compute_metrics(metric: dict)#

Compute the specified metrics for each estimator.

Parameters:
metricdict

Dictionary containing the metric name as key and the metric function as value.

_compute_metric(name: str, fun: Callable)#

Compute a specific metric and add it to the metrics attribute.

Parameters:
namestr

The name of the metric.

funcallable

The function to compute the metric.

get_train_metrics() pandas.DataFrame#

Get the computed metrics for the training set.

Returns:
pd.DataFrame

DataFrame containing the computed metrics for the training set.

get_test_metrics() pandas.DataFrame#

Get the computed metrics for the test set.

Returns:
pd.DataFrame

DataFrame containing the computed metrics for the test set.

__get_metrics_helper(identifier) pandas.DataFrame#
plot_variable_importance(mode='minmax', color='darkblue', cmap='flare', **kwargs)#

Plot the variable importance.

Parameters:
modestr, optional

The mode for plotting the variable importance, by default “minmax”.

colorstr, optional

The color for the plot, by default “darkblue”.

cmapstr, optional

The colormap for the plot, by default “flare”.

class palma.components.performance.ShapAnalysis(on, n_shap, compute_interaction=False)#

Bases: Analyser

Analyser class for performing analysis on a model.

Parameters:
onstr

The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.

__call__(project: Project, model: ModelEvaluation)#
__select_explainer()#
_compute_shap_values(n, is_regression, explainer_method=shap.TreeExplainer, compute_interaction=False)#
__change_features_name_to_string()#
plot_shap_summary_plot()#
plot_shap_decision_plot(**kwargs)#
plot_shap_interaction(feature_x, feature_y)#
class palma.components.performance.ScoringAnalysis(on)#

Bases: Analyser

The ScoringAnalyser class provides methods for analyzing the performance of a machine learning model.

property threshold#
confusion_matrix(in_percentage=False)#

Compute the confusion matrix.

Parameters:
in_percentagebool, optional

Whether to return the confusion matrix in percentage, by default False

Returns:
pandas.DataFrame

The confusion matrix

__interpolate_roc(_)#
plot_roc_curve(plot_method='mean', plot_train: bool = False, c='C0', cmap: str = 'inferno', label: str = '', mode: str = 'std', label_iter: iter = None, plot_base: bool = True, **kwargs)#

Plot the ROC curve.

Parameters:
plot_methodstr,

Select the type of plot for ROC curve

  • “beam” (default) to plot all the curves using shades

  • “all” to plot each ROC curve

  • “mean” plot the mean ROC curve

plot_train: bool

If True the train ROC curves will be plot, default False.

c: str

Not used only with plot_method=”all”. Set the color of ROC curve

cmap: str
label
mode
label_iter
plot_base: bool,

Plot basic ROC curve helper

kwargs:

Deprecated

Returns:
compute_threshold(method: str = 'total_population', value: float = 0.5, metric: Callable = None)#

Compute threshold using various heuristics

Parameters:
methodstr, optional

The method to compute the threshold, by default “total_population”

  • total population : compute threshold so that the percentage of

positive prediction is equal to value - fpr : compute threshold so that the false positive rate is equal to value - optimize_metric : compute threshold so that the metric is optimized value parameter is ignored, metric parameter must be provided

valuefloat, optional

The value to use for the threshold computation, by default 0.5

metrictyping.Callable, optional

The metric function to use for the threshold computation, by default None

Returns:
float

The computed threshold

plot_threshold(**plot_kwargs)#

Plot the threshold on fpr/tpr axes

Parameters:
plot_kwargsdict, optional

Additional keyword arguments to pass to the scatter plot function

Returns:
matplotlib.pyplot

The threshold plot

class palma.components.performance.RegressionAnalysis(on)#

Bases: Analyser

Analyser class for performing analysis on a regression model.

Parameters:
onstr

The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.

Attributes:
_hidden_metricsdict

Dictionary to store additional metrics that are not displayed.

Methods

variable_importance()

Compute the feature importance for each estimator.

compute_metrics(metric: dict)

Compute the specified metrics for each estimator.

get_train_metrics() -> pd.DataFrame

Get the computed metrics for the training set.

get_test_metrics() -> pd.DataFrame

Get the computed metrics for the test set.

plot_variable_importance(mode=”minmax”, color=”darkblue”, cmap=”flare”)

Plot the variable importance.

plot_prediction_versus_real

Plot prediction versus real values

plot_errors_pairgrid

Plot pair grid errors

compute_predictions_errors(fun=None)#
plot_prediction_versus_real(colormap=plot.get_cmap('rainbow'))#
plot_errors_pairgrid(fun=None, number_percentiles=4, palette='rocket_r', features=None)#
class palma.components.performance.PermutationFeatureImportance(n_repeat: int = 5, random_state: int = 42, n_job: int = 2, scoring: str = None, max_samples: int | float = 0.7, color: str = 'darkblue')#

Bases: palma.components.base.ModelComponent

Class for doing permutation feature importance

Parameters:
n_repeat: int

The number of times to permute a feature.

random_state: int

The pseudo-random number generator to control the permutations of each feature.

n_job: int

The number of jobs to run in parallel. If n_job = -1, it takes all processors.

max_samples: int or float

The number of samples to draw from X to compute feature importance in each repeat (without replacement). If int, then draw max_samples samples. If float, then draw max_samples * X.shape[0] samples.

color: str

The color for bar plot.

Methods

plot_permutation_feature_importance()

Plotting the result of feature permutation ONLY on the TRAINING SET

__call__(project: Project, model: ModelEvaluation)#
plot_permutation_feature_importance()#