palma.components.data_checker#

Module Contents#

Classes#

DeepCheck

This object is a wrapper of the Deepchecks library and allows to audit the

Leakage

Class for detecting data leakage in a classification project.

class palma.components.data_checker.DeepCheck(name: str = 'Data Checker', dataset_parameters: dict = None, dataset_checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite = data_integrity(), train_test_datasets_checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite = Suite('Checks train test', train_test_validation()), raise_on_fail=True)#

Bases: palma.components.base.ProjectComponent

This object is a wrapper of the Deepchecks library and allows to audit the data through various checks such as data drift, duplicate values, …

Parameters:
dataset_parametersdict, optional

Parameters and their values that will be used to generate deepchecks.Dataset instances (required to run the checks on)

dataset_checks: Union[List[BaseCheck], BaseSuite], optional

List of checks or suite of checks that will be run on the whole dataset By default: use the default suite single_dataset_integrity to detect the integrity issues

train_test_datasets_checks: Union[List[BaseCheck], BaseSuite], optional

List of checks or suite of checks to detect issues related to the train-test split, such as feature drift, detecting data leakage… By default, use the default suites train_test_validation and train_test_leakage

raise_on_fail: bool, optional

Raises error if one test fails

__call__(project: palma.base.project.Project) None#

Run suite of checks on the project data.

Parameters:
project: :class:`~palma.Project`
__generate_datasets(project: palma.base.project.Project, **kwargs) None#

Generate deepchecks.Dataset

Parameters:
project: project

Project

static __generate_suite(checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite, name: str) deepchecks.tabular.Suite#

Generate a Suite of checks from a list of checks or a suite of checks

Parameters:
checks: Union[List[BaseCheck], BaseSuite], optional

List of checks or suite of checks

name: str

Name for the suite to returned

Returns:
suite: deepchecks.Suite

instance of deepchecks.Suite

class palma.components.data_checker.Leakage#

Bases: palma.components.base.ProjectComponent

Class for detecting data leakage in a classification project.

This class implements component that checks for data leakage in a given project. It uses the FLAML optimizer for model selection and performs a scoring analysis to check for the presence of data leakage based on the AUC metric.

property metrics#
__call__(project: palma.base.project.Project) None#
cross_validation_leakage(project)#