palma.preprocessing package#
Submodules#
palma.preprocessing.na_encoder module#
- class palma.preprocessing.na_encoder.NA_encoder(numerical_strategy='mean', categorical_strategy='<NULL>')#
Bases:
object
Encodes missing values for both numerical and categorical features.
Several strategies are possible in each case.
- Parameters:
- numerical_strategystr or float or int. default = “mean”
The strategy to encode NA for numerical features. Available strategies = “mean”, “median”, “most_frequent” or a float/int value
- categorical_strategystr, default = ‘<NULL>’
The strategy to encode NA for categorical features. Available strategies = a string or “most_frequent”
Methods
fit
(df_train[, y_train])Fits NA Encoder.
fit_transform
(df_train[, y_train])Fits NA Encoder and transforms the dataset.
get_params
([deep])Get parameters of a NA_encoder object.
set_params
(**params)Set parameters for a NA_encoder object.
transform
(df)Transform the dataset.
- fit(df_train, y_train=None)#
Fits NA Encoder.
- Parameters:
- df_trainpandas dataframe of shape = (n_train, n_features)
The train dataset with numerical and categorical features.
- y_trainpandas series of shape = (n_train, ), default = None
The target for classification or regression tasks.
- Returns:
- object
self
- fit_transform(df_train, y_train=None)#
Fits NA Encoder and transforms the dataset.
- Parameters:
- df_trainpandas.Dataframe of shape = (n_train, n_features)
The train dataset with numerical and categorical features.
- y_trainpandas.Series of shape = (n_train, ), default = None
The target for classification or regression tasks.
- Returns:
- pandas.Dataframe of shape = (n_train, n_features)
The train dataset with no missing values.
- get_params(deep=True)#
Get parameters of a NA_encoder object.
- set_params(**params)#
Set parameters for a NA_encoder object.
Set numerical strategy and categorical strategy.
- Parameters:
- numerical_strategystr or float or int. default = “mean”
The strategy to encode NA for numerical features.
- categorical_strategystr, default = ‘<NULL>’
The strategy to encode NA for categorical features.
- transform(df)#
Transform the dataset.
- Parameters:
- dfpandas.Dataframe of shape = (n, n_features)
The dataset with numerical and categorical features.
- Returns:
- pandas.Dataframe of shape = (n, n_features)
The dataset with no missing values.
palma.preprocessing.pca module#
- class palma.preprocessing.pca.PCA(data: DataFrame, prefix_name='pc')#
Bases:
object
- Attributes:
- nb_component
Methods
get_correlation
get_individual_contributions
get_variables_contributions
plot_circle_corr
plot_correlation_matrix
plot_cumulated_variance
plot_eigen_values
plot_factorial_plan
plot_var_cp
plot_variance_bar
set_nb_components
transform
- get_correlation(n_components=None) DataFrame #
- get_individual_contributions(n_components=None) DataFrame #
- get_variables_contributions(n_components=None) DataFrame #
- property nb_component#
- plot_circle_corr() None #
- plot_correlation_matrix() None #
- plot_cumulated_variance(color='tab:blue') None #
- plot_eigen_values() None #
- plot_factorial_plan(X: DataFrame, x_axis='pc1', y_axis='pc2', c=None, cmap=None) None #
- plot_var_cp(X: DataFrame, n_col=3, figsize=(10, 10), x_axis='pc1', y_axis='pc2') None #
- plot_variance_bar(separator=0.5) None #
- set_nb_components(n=None, variance_threshold: float = None, **kwargs)#
- transform(X: DataFrame) DataFrame #