core.preprocessing package

Submodules

core.preprocessing.auto_clean module

core.preprocessing.auto_clean.auto_clean(dataset, target_frequency)[source]

The full preprocessing for the given core.data.dataset.Dataset

Parameters
  • dataset (core.data.dataset.Dataset) – Dataset object that wants to perform preprocessing

  • target_frequency (int) – sampling frequency in second that the dataset wants to become

Returns

None

core.preprocessing.change module

core.preprocessing.change.change_to_binary(dataset)[source]

Change the occupancy data in core.data.dataset.Dataset to binary

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method

Returns

None

core.preprocessing.change.change_to_label(dataset)[source]

Change the occupancy data in core.data.dataset.Dataset to label encoding

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method

Returns

None

core.preprocessing.change.change_to_one_hot(dataset)[source]

Change the occupancy data in core.data.dataset.Dataset to one hot encoding

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to change the occupancy encoding method

Returns

None

core.preprocessing.downsample module

core.preprocessing.downsample.downsample(dataset, target_frequency, algorithm='mean')[source]

Downsampling the sampling frequency (decrease the number of rows) of given core.data.dataset.Dataset

Parameters
  • dataset (core.data.dataset.Dataset) – Dataset object that wants to downsample

  • target_frequency (int) – sampling frequency in second that the dataset wants to become

  • algorithm (str) – downsampling algorithm. Only 'mean' is available for now

Returns

None

core.preprocessing.fill module

core.preprocessing.fill.fill(dataset)[source]

Fill all nan value in the sensor data for given core.data.dataset.Dataset

Parameters

dataset (core.data.dataset.Dataset) – Dataset object that wants to fill in missing values

Returns

None

core.preprocessing.ontology module

core.preprocessing.ontology.ontology(dataset)[source]

Update the name of features to standard glossary

Parameters

dataset (core.data.dataset.Dataset or list(str)) – Dataset object or list of features’ name that wants to map to standard glossary

Return type

list(str)

Returns

Edited feature_list list

core.preprocessing.outlier module

core.preprocessing.outlier.remove_outlier(dataset, auto_fill=True, ratio=1.5)[source]

Remove potential outliers using IQR, and fill with numpy.nan. Outliers are the value that less than value at the first quantile - ratio * IOR or greater than value at the third quantile + ratio * IOR in its corresponding column

Parameters
  • dataset (core.data.dataset.Dataset) – Dataset object that wants to remove outliers

  • auto_fill (bool) – whether automatically fill the outliers or leave it nan

  • ratio (float) – IQR ratio in order to mark value as an outlier

Returns

None

core.preprocessing.upsample module

core.preprocessing.upsample.upsample(dataset, target_frequency, algorithm='linear')[source]

Upsampling the sampling frequency (increase the number of rows) of given core.data.dataset.Dataset

Parameters
  • dataset (core.data.dataset.Dataset) – Dataset object that wants to upsample

  • target_frequency (int) – sampling frequency in second that the dataset wants to become

  • algorithm (str) – upsampling algorithm. Only 'linear' is available for now

Returns

None

Module contents