core.preprocessing package¶

Submodules¶

core.preprocessing.auto_clean module¶

core.preprocessing.auto_clean.auto_clean(dataset, target_frequency)[source]¶

The full preprocessing for the given core.data.dataset.Dataset

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to perform preprocessing
target_frequency (int) – sampling frequency in second that the dataset wants to become

Returns

None

core.preprocessing.change module¶

core.preprocessing.change.change_to_binary(dataset)[source]¶

Change the occupancy data in core.data.dataset.Dataset to binary

Parameters: dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method
Returns: None

core.preprocessing.change.change_to_label(dataset)[source]¶

Change the occupancy data in core.data.dataset.Dataset to label encoding

Parameters: dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method
Returns: None

core.preprocessing.change.change_to_one_hot(dataset)[source]¶

Change the occupancy data in core.data.dataset.Dataset to one hot encoding

Parameters: dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method
Returns: None

core.preprocessing.downsample module¶

core.preprocessing.downsample.downsample(dataset, target_frequency, algorithm='mean')[source]¶

Downsampling the sampling frequency (decrease the number of rows) of given core.data.dataset.Dataset

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to downsample
target_frequency (int) – sampling frequency in second that the dataset wants to become
algorithm (str) – downsampling algorithm. Only 'mean' is available for now

Returns

None

core.preprocessing.fill module¶

core.preprocessing.fill.fill(dataset)[source]¶

Fill all nan value in the sensor data for given core.data.dataset.Dataset

Parameters: dataset (core.data.dataset.Dataset) – Dataset object that wants to fill in missing values
Returns: None

core.preprocessing.ontology module¶

core.preprocessing.ontology.ontology(dataset)[source]¶

Update the name of features to standard glossary

Parameters: dataset (core.data.dataset.Dataset or list(str)) – Dataset object or list of features’ name that wants to map to standard glossary
Return type: list(str)
Returns: Edited feature_list list

core.preprocessing.outlier module¶

core.preprocessing.outlier.remove_outlier(dataset, auto_fill=True, ratio=1.5)[source]¶

Remove potential outliers using IQR, and fill with numpy.nan. Outliers are the value that less than value at the first quantile - ratio * IOR or greater than value at the third quantile + ratio * IOR in its corresponding column

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to remove outliers
auto_fill (bool) – whether automatically fill the outliers or leave it nan
ratio (float) – IQR ratio in order to mark value as an outlier

Returns

None

core.preprocessing.upsample module¶

core.preprocessing.upsample.upsample(dataset, target_frequency, algorithm='linear')[source]¶

Upsampling the sampling frequency (increase the number of rows) of given core.data.dataset.Dataset

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to upsample
target_frequency (int) – sampling frequency in second that the dataset wants to become
algorithm (str) – upsampling algorithm. Only 'linear' is available for now

Returns

None

core.preprocessing package¶

Submodules¶

core.preprocessing.auto_clean module¶

core.preprocessing.change module¶

core.preprocessing.downsample module¶

core.preprocessing.fill module¶

core.preprocessing.ontology module¶

core.preprocessing.outlier module¶

core.preprocessing.upsample module¶

Module contents¶