palma.preprocessing.na_encoder#

Module Contents#

Classes#

NA_encoder

Encodes missing values for both numerical and categorical features.

class palma.preprocessing.na_encoder.NA_encoder(numerical_strategy='mean', categorical_strategy='<NULL>')#

Encodes missing values for both numerical and categorical features.

Several strategies are possible in each case.

Parameters:
numerical_strategystr or float or int. default = “mean”

The strategy to encode NA for numerical features. Available strategies = “mean”, “median”, “most_frequent” or a float/int value

categorical_strategystr, default = ‘<NULL>’

The strategy to encode NA for categorical features. Available strategies = a string or “most_frequent”

get_params(deep=True)#

Get parameters of a NA_encoder object.

set_params(**params)#

Set parameters for a NA_encoder object.

Set numerical strategy and categorical strategy.

Parameters:
numerical_strategystr or float or int. default = “mean”

The strategy to encode NA for numerical features.

categorical_strategystr, default = ‘<NULL>’

The strategy to encode NA for categorical features.

fit(df_train, y_train=None)#

Fits NA Encoder.

Parameters:
df_trainpandas dataframe of shape = (n_train, n_features)

The train dataset with numerical and categorical features.

y_trainpandas series of shape = (n_train, ), default = None

The target for classification or regression tasks.

Returns:
object

self

fit_transform(df_train, y_train=None)#

Fits NA Encoder and transforms the dataset.

Parameters:
df_trainpandas.Dataframe of shape = (n_train, n_features)

The train dataset with numerical and categorical features.

y_trainpandas.Series of shape = (n_train, ), default = None

The target for classification or regression tasks.

Returns:
pandas.Dataframe of shape = (n_train, n_features)

The train dataset with no missing values.

transform(df)#

Transform the dataset.

Parameters:
dfpandas.Dataframe of shape = (n, n_features)

The dataset with numerical and categorical features.

Returns:
pandas.Dataframe of shape = (n, n_features)

The dataset with no missing values.