`palma.preprocessing.na_encoder`#

Module Contents#

Encodes missing values for both numerical and categorical features.

class palma.preprocessing.na_encoder.NA_encoder(numerical_strategy='mean', categorical_strategy='<NULL>')#

Encodes missing values for both numerical and categorical features.

Several strategies are possible in each case.

Parameters:

numerical_strategystr or float or int. default = “mean”: The strategy to encode NA for numerical features. Available strategies = “mean”, “median”, “most_frequent” or a float/int value
categorical_strategystr, default = ‘<NULL>’: The strategy to encode NA for categorical features. Available strategies = a string or “most_frequent”

set_params(**params)#

Set parameters for a NA_encoder object.

Set numerical strategy and categorical strategy.

Parameters:

numerical_strategystr or float or int. default = “mean”: The strategy to encode NA for numerical features.
categorical_strategystr, default = ‘<NULL>’: The strategy to encode NA for categorical features.

fit(df_train, y_train=None)#

Fits NA Encoder.

Parameters:

df_trainpandas dataframe of shape = (n_train, n_features): The train dataset with numerical and categorical features.
y_trainpandas series of shape = (n_train, ), default = None: The target for classification or regression tasks.

Returns:

fit_transform(df_train, y_train=None)#

Fits NA Encoder and transforms the dataset.

Parameters:

df_trainpandas.Dataframe of shape = (n_train, n_features): The train dataset with numerical and categorical features.
y_trainpandas.Series of shape = (n_train, ), default = None: The target for classification or regression tasks.

Returns:

pandas.Dataframe of shape = (n_train, n_features): The train dataset with no missing values.

transform(df)#

Transform the dataset.

Parameters:

dfpandas.Dataframe of shape = (n, n_features): The dataset with numerical and categorical features.

Returns:

pandas.Dataframe of shape = (n, n_features): The dataset with no missing values.