core.preprocessing package¶
This package contains functions that are useful for modifying values in core.data.dataset.Dataset
.
core.preprocessing.auto_clean.auto_clean()
automatically runs all preprocessing functions.
core.preprocessing.change module¶
-
core.preprocessing.change.
change_to_binary
(dataset)
-
core.preprocessing.change.
change_to_label
(dataset)
-
core.preprocessing.change.
change_to_one_hot
(dataset) change_to_binary()
,change_to_label()
andchange_to_one_hot()
changes the encoding method for occupancy in thecore.data.dataset.Dataset
object.:>>> import core >>> import numpy as np >>> data = np.array([[1, 2], [3, 4]]) >>> occupancy = np.array([[0], [3]]) >>> dataset = core.data.Dataset() >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"]) >>> dataset.occupancy array([[0.], [3.]]) >>> core.preprocessing.change_to_one_hot(dataset) >>> dataset.occupancy array([[1., 0., 0., 0.], [0., 0., 0., 1.]]) >>> core.preprocessing.change_to_label(dataset) >>> dataset.occupancy array([[0], [3]], dtype=int64) >>> core.preprocessing.change_to_binary(dataset) >>> dataset.occupancy array([[0], [1]], dtype=int64)
core.preprocessing.fill module¶
-
core.preprocessing.fill.
fill
(dataset) fill()
function helps impute all missing data in thecore.data.dataset.Dataset
object.:>>> import core >>> import numpy as np >>> data = np.array([[1, np.nan], [100, 4], [200, np.nan], [20, np.nan]]) >>> occupancy = np.array([[0], [3], [4], [7]]) >>> dataset = core.data.Dataset() >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"]) >>> dataset.data array([[ 1., nan], [100., 4.], [200., nan], [ 20., nan]]) >>> core.preprocessing.fill(dataset) >>> dataset.data array([[ 1., 4.], [100., 4.], [200., 4.], [ 20., 4.]])
core.preprocessing.ontology module¶
-
core.preprocessing.ontology.
ontology
(dataset) ontology()
function can be used to change the feature names in thecore.data.dataset.Dataset
object.:>>> import core >>> import numpy as np >>> data = np.array([[1, 1], [2, 4]]) >>> occupancy = np.array([[0], [3]]) >>> dataset = core.data.Dataset() >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["Tem.", "damp"]) >>> core.preprocessing.ontology(dataset) ['temperature', 'damper'] >>> dataset.feature_list ['temperature', 'damper']
core.preprocessing.outlier module¶
-
core.preprocessing.outlier.
remove_outlier
(dataset, auto_fill=True, ratio=1.5) remove_outlier()
function removes outliers that exist in thecore.data.dataset.Dataset
object and fills in the gap that will be produced.:>>> import core >>> import numpy as np >>> data = np.array([[1, 1], [2, 4], [200, 4], [3, 7]]) >>> occupancy = np.array([[0], [3], [4], [7]]) >>> dataset = core.data.Dataset() >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"]) >>> dataset.data array([[ 1., 1.], [ 2., 4.], [200., 4.], [ 3., 7.]]) >>> core.preprocessing.remove_outlier(dataset) >>> dataset.data array([[1., 1.], [2., 4.], [2., 4.], [3., 7.]])
core.preprocessing.upsample module¶
-
core.preprocessing.upsample.
upsample
(dataset, target_frequency, algorithm='linear')
-
core.preprocessing.downsample.
downsample
(dataset, target_frequency, algorithm='mean') upsample()
anddownsample()
increase and decrease the number of data points (i.e., the number of rows) in thecore.data.dataset.Dataset
object.:>>> import core >>> import numpy as np >>> data = np.array([[1, 2], [100, 4], [200, 6]]) >>> occupancy = np.array([[0], [3], [4]]) >>> dataset = core.data.Dataset() >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"]) >>> dataset.data array([[ 1., 2.], [100., 4.], [200., 6.]]) >>> dataset.time_column_index = 0 >>> core.preprocessing.upsample(dataset, 10) >>> dataset.data array([[ 1. , 2. ], [ 11. , 2.22222222], [ 21. , 2.44444444], [ 31. , 2.66666667], [ 41. , 2.88888889], [ 51. , 3.11111111], [ 61. , 3.33333333], [ 71. , 3.55555556], [ 81. , 3.77777778], [ 91. , 4. ], [101. , 4.2 ], [111. , 4.4 ], [121. , 4.6 ], [131. , 4.8 ], [141. , 5. ], [151. , 5.2 ], [161. , 5.4 ], [171. , 5.6 ], [181. , 5.8 ], [191. , 6. ]]) >>> core.preprocessing.downsample(dataset, 60) >>> dataset.data array([[ 1. , 2. ], [ 61. , 3.33333333], [121. , 4.6 ], [181. , 5.8 ]])