core.data package

This package contains multiple functions and one class related to read and write operations.

core.data.dataset module

class core.data.dataset.Dataset

Dataset class is the standard data structure for data sets.

Sample code for loading built-in data in class core.data.dataset.Dataset:

>>> import core
>>> dataset_built_in = core.data.load_sample("lbl-all")
>>> type(dataset_built_in)
<class 'core.data.dataset.Dataset'>

Sample code for creating a new core.data.dataset.Dataset object:

>>> import core
>>> dataset_new = core.data.Dataset()
>>> type(dataset_new)
<class 'core.data.dataset.Dataset'>

Sample code for importing data to a new core.data.dataset.Dataset object:

>>> import core
>>> dataset_import = core.data.import_data("Your-data-file.csv", 0, room_name="Your-room")
>>> type(dataset_import)
<class 'core.data.dataset.Dataset'>
add_room(data, occupancy=None, room_name=None, header=True)

add_room() allows user to add a single room to core.data.dataset.Dataset object. If no occupancy data, core will assign numpy.NaN values to the room. If room_name is not given, core will assign the room name to the first available natural number. If header is not provided, core will assume the name for each column is a sequence of positive integer.:

>>> import numpy
>>> data = numpy.array([[1, 2], [3, 4]])
>>> occ = numpy.array([[0], [2]])
>>> dataset_new.add_room(data, occupancy=occ, room_name="test", header=["name 1", "name 2"])
>>> dataset_new.room_list
['test']
>>> dataset_new.data
array([[1., 2.],
       [3., 4.]])
change_feature_mapping(feature_mapping)

change_feature_mapping() method enables the user to completely change the header. The new header must be a bidirectional dictionary.:

>>> dataset_new.feature_list
['name 1', 'name 2']
>>> dataset_new.change_feature_mapping({0: "New 1", 1: "New 2", "New 1": 0, "New 2": 1})
>>> dataset_new.feature_list
['New 1', 'New 2']
change_feature_name(old, new)

change_feature_name() method changes the name of one feature in the header list.:

>>> dataset_new.feature_list
['New 1', 'New 2']
>>> dataset_new.change_feature_name("New 1", "Old 1")
>>> dataset_new.feature_list
['Old 1', 'New 2']
set_feature_name(feature_list)

set_feature_name() method modifies the name of every feature without using a bidirectional dictionary.:

>>> dataset_new.feature_list
['Old 1', 'New 2']
>>> dataset_new.set_feature_name(["New 1", "Old 2"])
>>> dataset_new.feature_list
["New 1", "Old 2"]
change_occupancy(occupancy)

change_occupancy() method updates the occupancy states completely.:

>>> dataset_new.occupancy
array([[0.],
       [2.]])
>>> dataset_new.change_occupancy(numpy.array([[8],[1]]))
>>> dataset_new.occupancy
array([[8],
       [1]])
change_room_mapping(room)

change_room_mapping() method updates the room mapping. The new mapping must be a bidirectional dictionary.:

>>> dataset_new.room_list
['test']
>>> dataset_new.change_room_mapping({0: "Room 1", "Room 1": 0})
>>> dataset_new.room_list
['Room 1']
change_values(data)

change_values() method updates all sensor values in the data set.:

>>> dataset_new.data
array([[1., 2.],
       [3., 4.]])
>>> dataset_new.change_values(numpy.array([[3, 4], [1, 2]]))
>>> dataset_new.data
array([[3, 4],
       [1, 2]])
remove_feature(features, error=True)

remove_feature() method removes one feature or a list of features from the whole data set.:

>>> dataset.feature_list
['time', 'indoor-temp', 'indoor-humidity', 'indoor-flow', 'indoor-radiant', 'indoor-lumen', 'indoor-co2', 'outdoor-temp', 'outdoor-humidity', 'ourdoor-flow']
>>> dataset.remove_feature("indoor-temp")
>>> dataset.feature_list
['time', 'indoor-humidity', 'indoor-flow', 'indoor-radiant', 'indoor-lumen', 'indoor-co2', 'outdoor-temp', 'outdoor-humidity', 'ourdoor-flow']
>>> dataset.remove_feature(["indoor-humidity", "indoor-flow"])
>>> dataset.feature_list
['time', 'indoor-radiant', 'indoor-lumen', 'indoor-co2', 'outdoor-temp', 'outdoor-humidity', 'ourdoor-flow']
>>> dataset.remove_feature(["indoor-humidity", "indoor-flow"])
Traceback (most recent call last):
  File "<input>", line 1, in remove_feature
    raise KeyError("The feature {} does not exist in the dataset!".format(feature))
KeyError: 'The feature indoor-humidity does not exist in the dataset!'
>>> dataset.remove_feature(["indoor-humidity", "indoor-radiant"], error=False)
>>> dataset.feature_list
['time', 'indoor-lumen', 'indoor-co2', 'outdoor-temp', 'outdoor-humidity', 'ourdoor-flow']
select_feature(features, error=True)

select_feature() method selects one feature or a list of features in the data set similar to remove_feature().

split(percentage)

split() method separates the data set into two data sets given a split point.:

>>> dataset.room_list
['data_1', 'data_2', 'data_3', 'data_4', 'data_5', 'data_6', 'data_7', 'data_8', 'data_9', 'data_10', 'data_11', 'data_12', 'data_13', 'data_14', 'data_15', 'data_16', 'data_17', 'data_18', 'data_19', 'data_20', 'data_21', 'data_22', 'data_23', 'data_24']
>>> dataset1, dataset2 = dataset.split(0.5)
>>> dataset1.room_list
['data_1', 'data_2', 'data_3', 'data_4', 'data_5', 'data_6', 'data_7', 'data_8', 'data_9', 'data_10', 'data_11', 'data_12', 'Partially data_13']
>>> dataset2.room_list
['Partially data_13', 'data_14', 'data_15', 'data_16', 'data_17', 'data_18', 'data_19', 'data_20', 'data_21', 'data_22', 'data_23', 'data_24']

core.data.import_data module

core.data.import_data.import_data(file_name, time_column_index=None, mode='csv', header=True, room_name=None, tz=0)

import_data() function reads a raw data set. It is assumed that each row represents a data point and each column represents a feature. Features must be numerical values.:

>>> import core
>>> dataset = core.data.import_data("Your-data-file.csv", 0, room_name="Your-room")
>>> type(dataset)
<class 'core.data.dataset.Dataset'>

core.data.io module

core.data.io.read_dataset(file_name)
core.data.io.save_dataset(dataset, file_name)

read_dataset() and save_dataset() functions dump core.data.dataset.Dataset to a binary file stored on the disk.:

>>> import core
>>> dataset = core.data.load_sample("lbl-all")
>>> core.data.save_dataset(dataset, "temp")
>>> dataset_2 = core.data.read_dataset("temp")

core.data.load_sample module

core.data.load_sample.load_sample(sample_name)

load_sample() function can load binary file from core/data/binary_dataset/. Use '-' to represent the directory, e.g., to load data named core/data/binary_dataset/sdu/all, the sample_name should be set to ‘sdu-all’.:

>>> import core
>>> dataset = core.data.load_sample("lbl-all")