core.preprocessing package

This package contains functions that are useful for modifying values in core.data.dataset.Dataset. core.preprocessing.auto_clean.auto_clean() automatically runs all preprocessing functions.

core.preprocessing.change module

core.preprocessing.change.change_to_binary(dataset)
core.preprocessing.change.change_to_label(dataset)
core.preprocessing.change.change_to_one_hot(dataset)

change_to_binary(), change_to_label() and change_to_one_hot() changes the encoding method for occupancy in the core.data.dataset.Dataset object.:

>>> import core
>>> import numpy as np
>>> data = np.array([[1, 2], [3, 4]])
>>> occupancy = np.array([[0], [3]])
>>> dataset = core.data.Dataset()
>>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
>>> dataset.occupancy
array([[0.],
       [3.]])
>>> core.preprocessing.change_to_one_hot(dataset)
>>> dataset.occupancy
array([[1., 0., 0., 0.],
       [0., 0., 0., 1.]])
>>> core.preprocessing.change_to_label(dataset)
>>> dataset.occupancy
array([[0],
       [3]], dtype=int64)
>>> core.preprocessing.change_to_binary(dataset)
>>> dataset.occupancy
array([[0],
       [1]], dtype=int64)

core.preprocessing.fill module

core.preprocessing.fill.fill(dataset)

fill() function helps impute all missing data in the core.data.dataset.Dataset object.:

>>> import core
>>> import numpy as np
>>> data = np.array([[1, np.nan], [100, 4], [200, np.nan], [20, np.nan]])
>>> occupancy = np.array([[0], [3], [4], [7]])
>>> dataset = core.data.Dataset()
>>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
>>> dataset.data
array([[  1.,  nan],
       [100.,   4.],
       [200.,  nan],
       [ 20.,  nan]])
>>> core.preprocessing.fill(dataset)
>>> dataset.data
array([[  1.,   4.],
       [100.,   4.],
       [200.,   4.],
       [ 20.,   4.]])

core.preprocessing.ontology module

core.preprocessing.ontology.ontology(dataset)

ontology() function can be used to change the feature names in the core.data.dataset.Dataset object.:

>>> import core
>>> import numpy as np
>>> data = np.array([[1, 1], [2, 4]])
>>> occupancy = np.array([[0], [3]])
>>> dataset = core.data.Dataset()
>>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["Tem.", "damp"])
>>> core.preprocessing.ontology(dataset)
['temperature', 'damper']
>>> dataset.feature_list
['temperature', 'damper']

core.preprocessing.outlier module

core.preprocessing.outlier.remove_outlier(dataset, auto_fill=True, ratio=1.5)

remove_outlier() function removes outliers that exist in the core.data.dataset.Dataset object and fills in the gap that will be produced.:

>>> import core
>>> import numpy as np
>>> data = np.array([[1, 1], [2, 4], [200, 4], [3, 7]])
>>> occupancy = np.array([[0], [3], [4], [7]])
>>> dataset = core.data.Dataset()
>>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
>>> dataset.data
array([[  1.,   1.],
       [  2.,   4.],
       [200.,   4.],
       [  3.,   7.]])
>>> core.preprocessing.remove_outlier(dataset)
>>> dataset.data
array([[1., 1.],
       [2., 4.],
       [2., 4.],
       [3., 7.]])

core.preprocessing.upsample module

core.preprocessing.upsample.upsample(dataset, target_frequency, algorithm='linear')
core.preprocessing.downsample.downsample(dataset, target_frequency, algorithm='mean')

upsample() and downsample() increase and decrease the number of data points (i.e., the number of rows) in the core.data.dataset.Dataset object.:

>>> import core
>>> import numpy as np
>>> data = np.array([[1, 2], [100, 4], [200, 6]])
>>> occupancy = np.array([[0], [3], [4]])
>>> dataset = core.data.Dataset()
>>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
>>> dataset.data
array([[  1.,   2.],
       [100.,   4.],
       [200.,   6.]])
>>> dataset.time_column_index = 0
>>> core.preprocessing.upsample(dataset, 10)
>>> dataset.data
array([[  1.        ,   2.        ],
       [ 11.        ,   2.22222222],
       [ 21.        ,   2.44444444],
       [ 31.        ,   2.66666667],
       [ 41.        ,   2.88888889],
       [ 51.        ,   3.11111111],
       [ 61.        ,   3.33333333],
       [ 71.        ,   3.55555556],
       [ 81.        ,   3.77777778],
       [ 91.        ,   4.        ],
       [101.        ,   4.2       ],
       [111.        ,   4.4       ],
       [121.        ,   4.6       ],
       [131.        ,   4.8       ],
       [141.        ,   5.        ],
       [151.        ,   5.2       ],
       [161.        ,   5.4       ],
       [171.        ,   5.6       ],
       [181.        ,   5.8       ],
       [191.        ,   6.        ]])
>>> core.preprocessing.downsample(dataset, 60)
>>> dataset.data
array([[  1.        ,   2.        ],
       [ 61.        ,   3.33333333],
       [121.        ,   4.6       ],
       [181.        ,   5.8       ]])