palma.components.performance
#
Module Contents#
Classes#
Analyser class for performing analysis on a model. |
|
Analyser class for performing analysis on a model. |
|
The ScoringAnalyser class provides methods for analyzing the performance of |
|
Analyser class for performing analysis on a regression model. |
|
Class for doing permutation feature importance |
Attributes#
- palma.components.performance.__fpr_sampling__#
- class palma.components.performance.Analyser(on)#
Bases:
palma.components.base.ModelComponent
Analyser class for performing analysis on a model.
- Parameters:
- onstr
The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.
- property metrics#
- __call__(project: Project, model: ModelEvaluation)#
- _add(project, model)#
- variable_importance()#
Compute the feature importance for each estimator.
- Returns:
- feature_importancepandas.DataFrame
DataFrame containing the feature importance values for each estimator.
- compute_metrics(metric: dict)#
Compute the specified metrics for each estimator.
- Parameters:
- metricdict
Dictionary containing the metric name as key and the metric function as value.
- _compute_metric(name: str, fun: Callable)#
Compute a specific metric and add it to the metrics attribute.
- Parameters:
- namestr
The name of the metric.
- funcallable
The function to compute the metric.
- get_train_metrics() pandas.DataFrame #
Get the computed metrics for the training set.
- Returns:
- pd.DataFrame
DataFrame containing the computed metrics for the training set.
- get_test_metrics() pandas.DataFrame #
Get the computed metrics for the test set.
- Returns:
- pd.DataFrame
DataFrame containing the computed metrics for the test set.
- __get_metrics_helper(identifier) pandas.DataFrame #
- plot_variable_importance(mode='minmax', color='darkblue', cmap='flare', **kwargs)#
Plot the variable importance.
- Parameters:
- modestr, optional
The mode for plotting the variable importance, by default “minmax”.
- colorstr, optional
The color for the plot, by default “darkblue”.
- cmapstr, optional
The colormap for the plot, by default “flare”.
- class palma.components.performance.ShapAnalysis(on, n_shap, compute_interaction=False)#
Bases:
Analyser
Analyser class for performing analysis on a model.
- Parameters:
- onstr
The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.
- __call__(project: Project, model: ModelEvaluation)#
- __select_explainer()#
- _compute_shap_values(n, is_regression, explainer_method=shap.TreeExplainer, compute_interaction=False)#
- __change_features_name_to_string()#
- plot_shap_summary_plot()#
- plot_shap_decision_plot(**kwargs)#
- plot_shap_interaction(feature_x, feature_y)#
- class palma.components.performance.ScoringAnalysis(on)#
Bases:
Analyser
The ScoringAnalyser class provides methods for analyzing the performance of a machine learning model.
- property threshold#
- confusion_matrix(in_percentage=False)#
Compute the confusion matrix.
- Parameters:
- in_percentagebool, optional
Whether to return the confusion matrix in percentage, by default False
- Returns:
- pandas.DataFrame
The confusion matrix
- __interpolate_roc(_)#
- plot_roc_curve(plot_method='mean', plot_train: bool = False, c='C0', cmap: str = 'inferno', label: str = '', mode: str = 'std', label_iter: iter = None, plot_base: bool = True, **kwargs)#
Plot the ROC curve.
- Parameters:
- plot_methodstr,
Select the type of plot for ROC curve
“beam” (default) to plot all the curves using shades
“all” to plot each ROC curve
“mean” plot the mean ROC curve
- plot_train: bool
If True the train ROC curves will be plot, default False.
- c: str
Not used only with plot_method=”all”. Set the color of ROC curve
- cmap: str
- label
- mode
- label_iter
- plot_base: bool,
Plot basic ROC curve helper
- kwargs:
Deprecated
- Returns:
- compute_threshold(method: str = 'total_population', value: float = 0.5, metric: Callable = None)#
Compute threshold using various heuristics
- Parameters:
- methodstr, optional
The method to compute the threshold, by default “total_population”
total population : compute threshold so that the percentage of
positive prediction is equal to value - fpr : compute threshold so that the false positive rate is equal to value - optimize_metric : compute threshold so that the metric is optimized value parameter is ignored, metric parameter must be provided
- valuefloat, optional
The value to use for the threshold computation, by default 0.5
- metrictyping.Callable, optional
The metric function to use for the threshold computation, by default None
- Returns:
- float
The computed threshold
- plot_threshold(**plot_kwargs)#
Plot the threshold on fpr/tpr axes
- Parameters:
- plot_kwargsdict, optional
Additional keyword arguments to pass to the scatter plot function
- Returns:
- matplotlib.pyplot
The threshold plot
- class palma.components.performance.RegressionAnalysis(on)#
Bases:
Analyser
Analyser class for performing analysis on a regression model.
- Parameters:
- onstr
The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.
- Attributes:
- _hidden_metricsdict
Dictionary to store additional metrics that are not displayed.
Methods
variable_importance()
Compute the feature importance for each estimator.
compute_metrics(metric: dict)
Compute the specified metrics for each estimator.
get_train_metrics() -> pd.DataFrame
Get the computed metrics for the training set.
get_test_metrics() -> pd.DataFrame
Get the computed metrics for the test set.
plot_variable_importance(mode=”minmax”, color=”darkblue”, cmap=”flare”)
Plot the variable importance.
plot_prediction_versus_real
Plot prediction versus real values
plot_errors_pairgrid
Plot pair grid errors
- compute_predictions_errors(fun=None)#
- plot_prediction_versus_real(colormap=plot.get_cmap('rainbow'))#
- plot_errors_pairgrid(fun=None, number_percentiles=4, palette='rocket_r', features=None)#
- class palma.components.performance.PermutationFeatureImportance(n_repeat: int = 5, random_state: int = 42, n_job: int = 2, scoring: str = None, max_samples: int | float = 0.7, color: str = 'darkblue')#
Bases:
palma.components.base.ModelComponent
Class for doing permutation feature importance
- Parameters:
- n_repeat: int
The number of times to permute a feature.
- random_state: int
The pseudo-random number generator to control the permutations of each feature.
- n_job: int
The number of jobs to run in parallel. If n_job = -1, it takes all processors.
- max_samples: int or float
The number of samples to draw from X to compute feature importance in each repeat (without replacement). If int, then draw max_samples samples. If float, then draw max_samples * X.shape[0] samples.
- color: str
The color for bar plot.
Methods
plot_permutation_feature_importance()
Plotting the result of feature permutation ONLY on the TRAINING SET
- __call__(project: Project, model: ModelEvaluation)#
- plot_permutation_feature_importance()#