Components#

This library empowers experienced users to integrate components, allowing them to enhance their projects with optional analyses. The goal is to seamlessly integrate analyses into projects using the provided framework.

Adding Components#

To add a component, simply use the method .add of project or model:

import pandas as pd
import tempfile
from sklearn import model_selection
from sklearn.datasets import make_classification

from palma import Project
from palma import components

X, y = make_classification(n_informative=2, n_features=100)
X, y = pd.DataFrame(X), pd.Series(y).astype(bool)
splitter = model_selection.ShuffleSplit(n_splits=10, random_state=42)
project = Project(problem="classification", project_name="default")

project.add(components.FileSystemLogger(tempfile.gettempdir()))  # Add an empty component
project.start(X, y, splitter=splitter)                           # Execute the __call__ of all added components

Accessing the Components#

Warning

This API is likely to change !

To access the components (for instance, those providing analysis methods and plots), do:

component = model.components["Component"]
component
_images/component.png

Implemented Components

Associated to `Project`

  • Profiler: using Ydata profiling, create a report. Requires logger

  • DeepChecks: run analysis to prevent data drift and data leakage

Associated to `ModelEvaluation`

  • ScoringAnalyser: analysis tools for a classification problem

  • RegressionAnalyser: analysis tools for a regression problem

  • ShapAnalyser: local explanation using SHAP values

Create component#

Assume you have a function fun(X, y, **parameters) that makes some analysis; then, you can instantiate a component as follows:

from palma.components.base import ProjectComponent  # import base class
from palma import logger

def fun(X, y, **kwargs):
    return {"average" : y.mean()}

class MyComponent(ProjectComponent):
    def __init__(self, **parameters):
        self.parameters = parameters

    def __call__(self, project):
        print("Component properly called")
        ret = fun(project.X, project.y, **self.parameters)
        logger.logger.log_metrics(ret, path="my_component") # log something
        return ret

And then use it in the following fashion:

project = Project(problem="classification", project_name="default")
project.add(MyComponent())   # Add an empty component
project.start(X, y, splitter=splitter)

If you want to create a ModelEvaluation’s component, the signature of the __call__ is slightly different and should use both project and model objects:

def __call__(self, project, model):
    print("Component properly called")
    return fun(project.X, project.y, **self.parameters)

Log data#

Once the logger is configured, information can be recorded in the form of metrics, artifacts, or parameters.

  • A metric typically represents a score, a p-value, an error, or a rate.

  • An artifact can take the form of a figure (created with Matplotlib), a model, or any picklable object.

  • A parameter, akin to a metric, primarily pertains to algorithm parameters, contextualization data, and similar attributes.

from palma import logger
from matplotlib.pyplot import figure
logger.logger.log_metrics({"name_of_the_metric":1}, path="metric")
logger.logger.log_artifact(figure(), path="figure")
logger.logger.log_params({"name_of_the_parameter":1}, path="param")