palma.preprocessing package#

Submodules#

palma.preprocessing.na_encoder module#

class palma.preprocessing.na_encoder.NA_encoder(numerical_strategy='mean', categorical_strategy='<NULL>')#

Bases: object

Encodes missing values for both numerical and categorical features.

Several strategies are possible in each case.

Parameters:
numerical_strategystr or float or int. default = “mean”

The strategy to encode NA for numerical features. Available strategies = “mean”, “median”, “most_frequent” or a float/int value

categorical_strategystr, default = ‘<NULL>’

The strategy to encode NA for categorical features. Available strategies = a string or “most_frequent”

Methods

fit(df_train[, y_train])

Fits NA Encoder.

fit_transform(df_train[, y_train])

Fits NA Encoder and transforms the dataset.

get_params([deep])

Get parameters of a NA_encoder object.

set_params(**params)

Set parameters for a NA_encoder object.

transform(df)

Transform the dataset.

fit(df_train, y_train=None)#

Fits NA Encoder.

Parameters:
df_trainpandas dataframe of shape = (n_train, n_features)

The train dataset with numerical and categorical features.

y_trainpandas series of shape = (n_train, ), default = None

The target for classification or regression tasks.

Returns:
object

self

fit_transform(df_train, y_train=None)#

Fits NA Encoder and transforms the dataset.

Parameters:
df_trainpandas.Dataframe of shape = (n_train, n_features)

The train dataset with numerical and categorical features.

y_trainpandas.Series of shape = (n_train, ), default = None

The target for classification or regression tasks.

Returns:
pandas.Dataframe of shape = (n_train, n_features)

The train dataset with no missing values.

get_params(deep=True)#

Get parameters of a NA_encoder object.

set_params(**params)#

Set parameters for a NA_encoder object.

Set numerical strategy and categorical strategy.

Parameters:
numerical_strategystr or float or int. default = “mean”

The strategy to encode NA for numerical features.

categorical_strategystr, default = ‘<NULL>’

The strategy to encode NA for categorical features.

transform(df)#

Transform the dataset.

Parameters:
dfpandas.Dataframe of shape = (n, n_features)

The dataset with numerical and categorical features.

Returns:
pandas.Dataframe of shape = (n, n_features)

The dataset with no missing values.

palma.preprocessing.pca module#

class palma.preprocessing.pca.PCA(data: DataFrame, prefix_name='pc')#

Bases: object

Attributes:
nb_component

Methods

get_correlation

get_individual_contributions

get_variables_contributions

plot_circle_corr

plot_correlation_matrix

plot_cumulated_variance

plot_eigen_values

plot_factorial_plan

plot_var_cp

plot_variance_bar

set_nb_components

transform

get_correlation(n_components=None) DataFrame#
get_individual_contributions(n_components=None) DataFrame#
get_variables_contributions(n_components=None) DataFrame#
property nb_component#
plot_circle_corr() None#
plot_correlation_matrix() None#
plot_cumulated_variance(color='tab:blue') None#
plot_eigen_values() None#
plot_factorial_plan(X: DataFrame, x_axis='pc1', y_axis='pc2', c=None, cmap=None) None#
plot_var_cp(X: DataFrame, n_col=3, figsize=(10, 10), x_axis='pc1', y_axis='pc2') None#
plot_variance_bar(separator=0.5) None#
set_nb_components(n=None, variance_threshold: float = None, **kwargs)#
transform(X: DataFrame) DataFrame#

Module contents#