palma.components.data_checker
#
Module Contents#
Classes#
This object is a wrapper of the Deepchecks library and allows to audit the |
|
Class for detecting data leakage in a classification project. |
- class palma.components.data_checker.DeepCheck(name: str = 'Data Checker', dataset_parameters: dict = None, dataset_checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite = data_integrity(), train_test_datasets_checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite = Suite('Checks train test', train_test_validation()), raise_on_fail=True)#
Bases:
palma.components.base.ProjectComponent
This object is a wrapper of the Deepchecks library and allows to audit the data through various checks such as data drift, duplicate values, …
- Parameters:
- dataset_parametersdict, optional
Parameters and their values that will be used to generate
deepchecks.Dataset
instances (required to run the checks on)- dataset_checks: Union[List[BaseCheck], BaseSuite], optional
List of checks or suite of checks that will be run on the whole dataset By default: use the default suite single_dataset_integrity to detect the integrity issues
- train_test_datasets_checks: Union[List[BaseCheck], BaseSuite], optional
List of checks or suite of checks to detect issues related to the train-test split, such as feature drift, detecting data leakage… By default, use the default suites train_test_validation and train_test_leakage
- raise_on_fail: bool, optional
Raises error if one test fails
- __call__(project: palma.base.project.Project) None #
Run suite of checks on the project data.
- Parameters:
- project: :class:`~palma.Project`
- __generate_datasets(project: palma.base.project.Project, **kwargs) None #
Generate
deepchecks.Dataset
- Parameters:
- project: project
- static __generate_suite(checks: List[deepchecks.core.BaseCheck] | deepchecks.core.BaseSuite, name: str) deepchecks.tabular.Suite #
Generate a Suite of checks from a list of checks or a suite of checks
- Parameters:
- checks: Union[List[BaseCheck], BaseSuite], optional
List of checks or suite of checks
- name: str
Name for the suite to returned
- Returns:
- suite:
deepchecks.Suite
instance of
deepchecks.Suite
- suite:
- class palma.components.data_checker.Leakage#
Bases:
palma.components.base.ProjectComponent
Class for detecting data leakage in a classification project.
This class implements component that checks for data leakage in a given project. It uses the FLAML optimizer for model selection and performs a scoring analysis to check for the presence of data leakage based on the AUC metric.
- property metrics#
- __call__(project: palma.base.project.Project) None #
- cross_validation_leakage(project)#