core.preprocessing package¶
Submodules¶
core.preprocessing.auto_clean module¶
-
core.preprocessing.auto_clean.
auto_clean
(dataset, target_frequency)[source]¶ The full preprocessing for the given core.data.dataset.Dataset
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to perform preprocessing
target_frequency (int) – sampling frequency in second that the dataset wants to become
- Returns
None
core.preprocessing.change module¶
-
core.preprocessing.change.
change_to_binary
(dataset)[source]¶ Change the occupancy data in core.data.dataset.Dataset to binary
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method
- Returns
None
-
core.preprocessing.change.
change_to_label
(dataset)[source]¶ Change the occupancy data in core.data.dataset.Dataset to label encoding
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method
- Returns
None
-
core.preprocessing.change.
change_to_one_hot
(dataset)[source]¶ Change the occupancy data in core.data.dataset.Dataset to one hot encoding
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method
- Returns
None
core.preprocessing.downsample module¶
-
core.preprocessing.downsample.
downsample
(dataset, target_frequency, algorithm='mean')[source]¶ Downsampling the sampling frequency (decrease the number of rows) of given core.data.dataset.Dataset
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to downsample
target_frequency (int) – sampling frequency in second that the dataset wants to become
algorithm (str) – downsampling algorithm. Only
'mean'
is available for now
- Returns
None
core.preprocessing.fill module¶
-
core.preprocessing.fill.
fill
(dataset)[source]¶ Fill all nan value in the sensor data for given core.data.dataset.Dataset
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to fill in missing values
- Returns
None
core.preprocessing.ontology module¶
-
core.preprocessing.ontology.
ontology
(dataset)[source]¶ Update the name of features to standard glossary
- Parameters
dataset (core.data.dataset.Dataset or list(str)) – Dataset object or list of features’ name that wants to map to standard glossary
- Return type
- Returns
Edited feature_list list
core.preprocessing.outlier module¶
-
core.preprocessing.outlier.
remove_outlier
(dataset, auto_fill=True, ratio=1.5)[source]¶ Remove potential outliers using IQR, and fill with numpy.nan. Outliers are the value that less than value at the first quantile - ratio * IOR or greater than value at the third quantile + ratio * IOR in its corresponding column
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to remove outliers
auto_fill (bool) – whether automatically fill the outliers or leave it nan
ratio (float) – IQR ratio in order to mark value as an outlier
- Returns
None
core.preprocessing.upsample module¶
-
core.preprocessing.upsample.
upsample
(dataset, target_frequency, algorithm='linear')[source]¶ Upsampling the sampling frequency (increase the number of rows) of given core.data.dataset.Dataset
- Parameters
dataset (core.data.dataset.Dataset) – Dataset object that wants to upsample
target_frequency (int) – sampling frequency in second that the dataset wants to become
algorithm (str) – upsampling algorithm. Only
'linear'
is available for now
- Returns
None