palma.components
#
Submodules#
Package Contents#
Classes#
A logger for saving artifacts and metadata to the file system. |
|
MLFlowLogger class for logging experiments using MLflow. |
|
Base Project Component class |
|
Analyser class for performing analysis on a regression model. |
|
The ScoringAnalyser class provides methods for analyzing the performance of |
|
Analyser class for performing analysis on a model. |
|
Class for doing permutation feature importance |
|
This object is a wrapper of the Deepchecks library and allows to audit the |
|
Class for detecting data leakage in a classification project. |
- class palma.components.FileSystemLogger(uri: str = tempfile.gettempdir(), **kwargs)#
Bases:
Logger
A logger for saving artifacts and metadata to the file system.
- Parameters:
- uristr, optional
The root path or directory where artifacts and metadata will be saved. Defaults to the system temporary directory.
- **kwargsdict
Additional keyword arguments to pass to the base logger.
- Attributes:
- path_projectstr
The path to the project directory.
- path_studystr
The path to the study directory within the project.
Methods
log_project(project: Project) -> None
Performs the first level of backup by creating folders and saving an instance of
Project
.log_metrics(metrics: dict, path: str) -> None
Saves metrics in JSON format at the specified path.
log_artifact(obj, path: str) -> None
Saves an artifact at the specified path, handling different types of objects.
log_params(parameters: dict, path: str) -> None
Saves model parameters in JSON format at the specified path.
- log_project(project: palma.base.project.Project) None #
log_project performs the first level of backup as described in the object description.
This method creates the needed folders and saves an instance of
Project
.- Parameters:
- project: :class:`~palma.Project`
an instance of Project
- log_metrics(metrics: dict, path: str) None #
Logs metrics to a JSON file.
- Parameters:
- metricsdict
The metrics to be logged.
- pathstr
The relative path (from the study directory) where the metrics JSON file will be saved.
- log_artifact(obj, path: str) None #
Logs an artifact, handling different types of objects.
- Parameters:
- objany
The artifact to be logged.
- pathstr
The relative path (from the study directory) where the artifact will be saved.
- log_params(parameters: dict, path: str) None #
Logs model parameters to a JSON file.
- Parameters:
- parametersdict
The model parameters to be logged.
- pathstr
The relative path (from the study directory) where the parameters JSON file will be saved.
- __create_directories()#
Creates the study directory if it doesn’t exist.
If the study directory does not exist, it is created along with any necessary parent directories.
- class palma.components.MLFlowLogger(uri: str, artifact_location: str = '.mlruns')#
Bases:
Logger
MLFlowLogger class for logging experiments using MLflow.
- Parameters:
- uristr
The URI for the MLflow tracking server.
- artifact_locationstr
The place to save artifact on file system logger
- Raises:
- ImportError: If mlflow is not installed.
- Attributes:
- tmp_logger(FileSystemLogger)
Temporary logger for local logging before MLflow logging.
Methods
log_project(project: ‘Project’) -> None:
Logs the project information to MLflow, including project name and parameters.
log_metrics(metrics: dict[str, typing.Any]) -> None:
Logs metrics to MLflow.
log_artifact(artifact: dict, path) -> None:
Logs artifacts to MLflow using the temporary logger.
log_params(params: dict) -> None:
Logs parameters to MLflow.
log_model(model, path) -> None:
Logs the model to MLflow using the temporary logger.
- log_project(project: palma.base.project.Project) None #
- log_metrics(metrics: dict[str, Any], path=None) None #
- log_artifact(artifact: dict, path) None #
- log_params(params: dict) None #
- class palma.components.ProfilerYData(**config)#
Bases:
palma.components.base.ProjectComponent
Base Project Component class
This object ensures that all subclasses Project component implements a
- class palma.components.ExplainerDashboard(dashboard_config: str | Dict = default_config_path, n_sample: int = None)#
Bases:
palma.components.base.Component
- __call__(project: Project, model: Model) explainerdashboard.ExplainerDashboard #
This function returns dashboard instance. This dashboard is to be run using its run method.
- Parameters:
- project: Project
Instance of project used to compute explainer.
- model: Run
Current run to use in explainer.
Examples
>>> db = ExpDash(dashboard_config="path_to_my_config") >>> explainer_dashboard = db(project, model) >>> explainer_dashboard.run( >>> port="8050", host="0.0.0.0", use_waitress=False)
- update_config(dict_value: Dict[str, Dict])#
Update specific parameters from the actual configuration.
- Parameters:
- dict_value: dict
- explainer_parameters: dict
Parameters to be used in see explainerdashboard.RegressionExplainer or explainerdashboard.ClassifierExplainer.
- dashboard_parameters: dict
Parameters use to compose dashboard tab, items or themes for explainerdashboard.ExplainerDashboard. Tabs and component of the dashboard can be hidden, see customize dashboard section for more detail.
- _prepare_dataset() None #
- This function performs the following processing steps :
Ensure that column name is str (bug encountered in dashboard)
Get code from categories just in case of category data types
Sample the data if specified by user
- _get_dashboard(explainer: explainerdashboard.explainers.BaseExplainer) ExplainerDashboard #
- class palma.components.RegressionAnalysis(on)#
Bases:
Analyser
Analyser class for performing analysis on a regression model.
- Parameters:
- onstr
The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.
- Attributes:
- _hidden_metricsdict
Dictionary to store additional metrics that are not displayed.
Methods
variable_importance()
Compute the feature importance for each estimator.
compute_metrics(metric: dict)
Compute the specified metrics for each estimator.
get_train_metrics() -> pd.DataFrame
Get the computed metrics for the training set.
get_test_metrics() -> pd.DataFrame
Get the computed metrics for the test set.
plot_variable_importance(mode=”minmax”, color=”darkblue”, cmap=”flare”)
Plot the variable importance.
plot_prediction_versus_real
Plot prediction versus real values
plot_errors_pairgrid
Plot pair grid errors
- compute_predictions_errors(fun=None)#
- plot_prediction_versus_real(colormap=plot.get_cmap('rainbow'))#
- plot_errors_pairgrid(fun=None, number_percentiles=4, palette='rocket_r', features=None)#
- class palma.components.ScoringAnalysis(on)#
Bases:
Analyser
The ScoringAnalyser class provides methods for analyzing the performance of a machine learning model.
- property threshold#
- confusion_matrix(in_percentage=False)#
Compute the confusion matrix.
- Parameters:
- in_percentagebool, optional
Whether to return the confusion matrix in percentage, by default False
- Returns:
- pandas.DataFrame
The confusion matrix
- __interpolate_roc(_)#
- plot_roc_curve(plot_method='mean', plot_train: bool = False, c='C0', cmap: str = 'inferno', label: str = '', mode: str = 'std', label_iter: iter = None, plot_base: bool = True, **kwargs)#
Plot the ROC curve.
- Parameters:
- plot_methodstr,
Select the type of plot for ROC curve
“beam” (default) to plot all the curves using shades
“all” to plot each ROC curve
“mean” plot the mean ROC curve
- plot_train: bool
If True the train ROC curves will be plot, default False.
- c: str
Not used only with plot_method=”all”. Set the color of ROC curve
- cmap: str
- label
- mode
- label_iter
- plot_base: bool,
Plot basic ROC curve helper
- kwargs:
Deprecated
- Returns:
- compute_threshold(method: str = 'total_population', value: float = 0.5, metric: Callable = None)#
Compute threshold using various heuristics
- Parameters:
- methodstr, optional
The method to compute the threshold, by default “total_population”
total population : compute threshold so that the percentage of
positive prediction is equal to value - fpr : compute threshold so that the false positive rate is equal to value - optimize_metric : compute threshold so that the metric is optimized value parameter is ignored, metric parameter must be provided
- valuefloat, optional
The value to use for the threshold computation, by default 0.5
- metrictyping.Callable, optional
The metric function to use for the threshold computation, by default None
- Returns:
- float
The computed threshold
- plot_threshold(**plot_kwargs)#
Plot the threshold on fpr/tpr axes
- Parameters:
- plot_kwargsdict, optional
Additional keyword arguments to pass to the scatter plot function
- Returns:
- matplotlib.pyplot
The threshold plot
- class palma.components.ShapAnalysis(on, n_shap, compute_interaction=False)#
Bases:
Analyser
Analyser class for performing analysis on a model.
- Parameters:
- onstr
The type of analysis to perform. Possible values are “indexes_train_test” or “indexes_val”.
- __call__(project: Project, model: ModelEvaluation)#
- __select_explainer()#
- _compute_shap_values(n, is_regression, explainer_method=shap.TreeExplainer, compute_interaction=False)#
- __change_features_name_to_string()#
- plot_shap_summary_plot()#
- plot_shap_decision_plot(**kwargs)#
- plot_shap_interaction(feature_x, feature_y)#
- class palma.components.PermutationFeatureImportance(n_repeat: int = 5, random_state: int = 42, n_job: int = 2, scoring: str = None, max_samples: int | float = 0.7, color: str = 'darkblue')#
Bases:
palma.components.base.ModelComponent
Class for doing permutation feature importance
- Parameters:
- n_repeat: int
The number of times to permute a feature.
- random_state: int
The pseudo-random number generator to control the permutations of each feature.
- n_job: int
The number of jobs to run in parallel. If n_job = -1, it takes all processors.
- max_samples: int or float
The number of samples to draw from X to compute feature importance in each repeat (without replacement). If int, then draw max_samples samples. If float, then draw max_samples * X.shape[0] samples.
- color: str
The color for bar plot.
Methods
plot_permutation_feature_importance()
Plotting the result of feature permutation ONLY on the TRAINING SET
- __call__(project: Project, model: ModelEvaluation)#
- plot_permutation_feature_importance()#
- class palma.components.DeepCheck(name: str = 'Data Checker', dataset_parameters: dict = None, dataset_checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite = data_integrity(), train_test_datasets_checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite = Suite('Checks train test', train_test_validation()), raise_on_fail=True)#
Bases:
palma.components.base.ProjectComponent
This object is a wrapper of the Deepchecks library and allows to audit the data through various checks such as data drift, duplicate values, …
- Parameters:
- dataset_parametersdict, optional
Parameters and their values that will be used to generate
deepchecks.Dataset
instances (required to run the checks on)- dataset_checks: Union[List[BaseCheck], BaseSuite], optional
List of checks or suite of checks that will be run on the whole dataset By default: use the default suite single_dataset_integrity to detect the integrity issues
- train_test_datasets_checks: Union[List[BaseCheck], BaseSuite], optional
List of checks or suite of checks to detect issues related to the train-test split, such as feature drift, detecting data leakage… By default, use the default suites train_test_validation and train_test_leakage
- raise_on_fail: bool, optional
Raises error if one test fails
- __call__(project: palma.base.project.Project) None #
Run suite of checks on the project data.
- Parameters:
- project: :class:`~palma.Project`
- __generate_datasets(project: palma.base.project.Project, **kwargs) None #
Generate
deepchecks.Dataset
- Parameters:
- project: project
- static __generate_suite(checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite, name: str) deepchecks.tabular.Suite #
Generate a Suite of checks from a list of checks or a suite of checks
- Parameters:
- checks: Union[List[BaseCheck], BaseSuite], optional
List of checks or suite of checks
- name: str
Name for the suite to returned
- Returns:
- suite:
deepchecks.Suite
instance of
deepchecks.Suite
- suite:
- class palma.components.Leakage#
Bases:
palma.components.base.ProjectComponent
Class for detecting data leakage in a classification project.
This class implements component that checks for data leakage in a given project. It uses the FLAML optimizer for model selection and performs a scoring analysis to check for the presence of data leakage based on the AUC metric.
- property metrics#
- __call__(project: palma.base.project.Project) None #
- cross_validation_leakage(project)#