core.data package

Submodules

core.data.dataset module

class core.data.dataset.Dataset[source]

Bases: object

Core data set format. The standard data structure for occupancy and sensor data.

Note

All attributes are copies of the original values, therefore the changes will only be seen if user use methods to update values of self.

Variables
  • time_column_index (int) – the timestamp column in self.data

  • binary (bool) – indicate the occupancy data in self has binary encoding or not

  • labelled (bool) – indicate whether the occupancy data in self is available or not

Parameter

None

Return type

core.data.dataset.Dataset

add_room(data, occupancy=None, room_name=None, header=True)[source]

Add a new room to self. self.data can automatically expand.

Parameters
  • data (numpy.ndarray) – sensor data from the new room

  • occupancy (None or numpy.ndarray) – occupancy data from the new room. If None then fill with numpy.nan

  • room_name (None or str) – the name of the new room. If None then assign a unique index

  • header (bool) – Indicate whether the new room have a header on the first row

Returns

None

change_feature_mapping(feature_mapping)[source]

Replace the feature_mapping within self.

Parameters

feature_mapping (dict) – new feature mapping rule with bidirectional dict

Returns

None

change_feature_name(old, new)[source]

Replace one feature’s name.

Parameters
  • old (str) – original name for the feature in self

  • new (str) – new name name for the feature in self

Returns

None

change_occupancy(occupancy)[source]

Replace the data of self.occupancy.

Parameters

occupancy (numpy.ndarray) – new occupancy data have same number of rows with original occupancy data

Returns

None

change_room_mapping(room)[source]

Replace the room_mapping within self.

Parameters

room (dict) – new room mapping rule with bidirectional dict

Returns

None

change_values(data)[source]

Replace the sensor data of self.data.

Parameters

data (numpy.ndarray) – new sensor data have same shape with original sensor data

Returns

None

copy()[source]

Make a copy of self.

Parameter

None

Return type

core.data.dataset.Dataset

Returns

A same copy of self, with different addresses for all values

property data
Return type

numpy.ndarray

Returns

a copy of the sensor data in numpy.ndarray

property feature_list
Return type

list(str)

Returns

a list contains all feature names

property feature_mapping
Return type

dict

Returns

a bidirectional dictionary map feature names with corresponding column index

property occupancy
Return type

numpy.ndarray

Returns

a copy of the occupancy data in numpy.ndarray

pop_room(room_name)[source]

Remove a room from self.

Parameters

room_name (str) – name of the room need to be removed

Return type

core.data.dataset.Dataset

Returns

removed Dataset

remove_feature(features, error=True)[source]

Remove one or multiple features from the self.data.

Parameters
  • features (str or list(str)) – one or multiple features that need to be removed

  • error (bool) – whether throw an error if a name of feature is not available in self

Returns

None

property room_list
Return type

list(str)

Returns

a list contains all room names

property room_mapping
Return type

dict

Returns

a bidirectional dictionary map room names with corresponding row index tuple (start, end)

select_feature(features, error=True)[source]

Select one or multiple features from the self.data, remove rest features.

Parameters
  • features (str or list(str)) – one or multiple features that need to be selected

  • error (bool) – whether throw an error if any one of the name in parameter is not available in self

Returns

None

set_feature_name(feature_list)[source]

Replace all features’ name in given order.

Parameters

feature_list (list) – new feature name list have length same as number of columns of self.data

Returns

None

split(percentage)[source]

Separate self into two smaller core.data.dataset.Dataset objects by given split point.

Parameters

percentage (float) – percentage of the row in the first part

Returns

None

core.data.import_data module

core.data.import_data.import_data(file_name, time_column_index=None, mode='csv', header=True, room_name=None, tz=0)[source]

Load raw data from the disk.

Parameters
  • file_name (str) – the name of the raw data file

  • time_column_index (int) – the column index for the timestamp in given raw data file

  • mode (str) – the format for raw data. Currently only support csv

  • header (bool) – indicate whether the raw data contains a header on the first row. If False, then assign unique index for each column

  • room_name (str or None) – the name of the room. If None, then assign unique number for the room

  • tz (int) – the time zone offset that need to fix in the raw data file

Return type

core.data.dataset.Dataset

Returns

The structured data set with one raw input data

core.data.io module

core.data.io.read_dataset(file_name)[source]

Load a core.data.dataset.Dataset object from local disk binary file

Parameters

file_name (str) – name of the binary file

Return type

core.data.dataset.Dataset

Returns

Dataset object that load from a binary file

core.data.io.save_dataset(dataset, file_name)[source]

Save a core.data.dataset.Dataset object to local disk as a binary file

Parameters
  • dataset (core.data.dataset.Dataset) – Dataset object that want to save to local disk as a binary file

  • file_name (str) – name of the binary file

Returns

None

core.data.load_sample module

core.data.load_sample.load_sample(sample_name)[source]

Load one or more core.data.dataset.Dataset object from the sample folder

Parameters

sample_name (str or list(str)) – name(s) of the binary file in binary_dataset

Return type

core.data.dataset.Dataset or dict(str, core.data.dataset.Dataset)

Returns

Dataset object(s) that load from a binary file. If a list of name is provided, then a dictionary with their name as key and corresponding Dataset is returned

Module contents