core.stats packageΒΆ

This package contains functions for analyzing core.data.dataset.Dataset and assessing its quality. core.stats.analysis.analysis() runs all sorts of analysis automatically.

core.stats.dropout_rate.dropout_rate(dataset, dataset_level=False)

dropout_rate() function computes the percentage of data points missing in the core.data.dataset.Dataset object.

core.stats.frequency.frequency(dataset, dataset_level=True)

frequency() function computes the average sampling frequency of all features in the core.data.dataset.Dataset object.

core.stats.gap_detect.gap_detect(dataset, threshold, sensor_level=False)

gap_detect() function returns all gaps in the core.data.dataset.Dataset object.

We say that there is a Gap when the difference between timestamps of two successive values exceeds some threshold.

core.stats.occupancy_evaluation.occupancy_distribution_evaluation(dataset, dataset_level=True)

occupancy_distribution_evaluation() function computes the distribution of different occupancy states in the core.data.dataset.Dataset object.

core.stats.uptime.uptime(dataset, threshold, gaps=None)

uptime() function computes the length of time a sensor reported values. It differs from the dropout_rate() in that it captures the length of the time interval in which data points were available in the core.data.dataset.Dataset object.

A sample code snippet for using the package is here:

>>> import core
>>> dataset = core.data.load_sample("umons-all")
>>> from pprint import pprint
>>> # ===== Test dropout rate =====
>>> pprint(core.stats.dropout_rate(dataset, dataset_level=False))
{'datatest': 0.0, 'datatest2': 0.0, 'datatraining': 0.0}
>>> pprint(core.stats.dropout_rate(dataset, dataset_level=True))
0.0
>>> # ===== Test frequency =====
>>> pprint(core.stats.frequency(dataset, dataset_level=False))
{'datatest': 60.0, 'datatest2': 60.0, 'datatraining': 60.0}
>>> pprint(core.stats.frequency(dataset, dataset_level=True))
60.0
>>> # ===== Find all gaps =====
>>> pprint(core.stats.gap_detect(dataset, 65, sensor_level=False))
{'datatest': [], 'datatest2': [], 'datatraining': []}
>>> pprint(core.stats.gap_detect(dataset, 65, sensor_level=True))
{'datatest': {'CO2': [],
              'Humidity': [],
              'HumidityRatio': [],
              'Light': [],
              'Temperature': [],
              'id': []},
 'datatest2': {'CO2': [],
               'Humidity': [],
               'HumidityRatio': [],
               'Light': [],
               'Temperature': [],
               'id': []},
 'datatraining': {'CO2': [],
                  'Humidity': [],
                  'HumidityRatio': [],
                  'Light': [],
                  'Temperature': [],
                  'id': []}}
>>> # ===== Evaluate occupancy distribution =====
>>> pprint(core.stats.occupancy_distribution_evaluation(dataset, dataset_level=False))
{'datatest': {0.0: (1693, 0.6352720450281426),
              1.0: (972, 0.3647279549718574),
              'occupied': (972, 0.3647279549718574)},
 'datatest2': {0.0: (7703, 0.7898892534864643),
               1.0: (2049, 0.21011074651353567),
               'occupied': (2049, 0.21011074651353567)},
 'datatraining': {0.0: (6414, 0.7876703917475132),
                  1.0: (1729, 0.2123296082524868),
                  'occupied': (1729, 0.2123296082524868)}}
>>> pprint(core.stats.occupancy_distribution_evaluation(dataset, dataset_level=True))
{0.0: (15810, 0.7689688715953308),
 1.0: (4750, 0.23103112840466927),
 'occupied': (4750, 0.23103112840466927)}
>>> # ===== Test uptime =====
>>> pprint(core.stats.uptime(dataset, 65))
{'datatest': {'CO2': ('1 day, 20:24:00', 159840.0, 1.0),
              'Humidity': ('1 day, 20:24:00', 159840.0, 1.0),
              'HumidityRatio': ('1 day, 20:24:00', 159840.0, 1.0),
              'Light': ('1 day, 20:24:00', 159840.0, 1.0),
              'Temperature': ('1 day, 20:24:00', 159840.0, 1.0),
              'id': ('1 day, 20:24:00', 159840.0, 1.0)},
 'datatest2': {'CO2': ('6 days, 18:31:00', 585060.0, 1.0),
               'Humidity': ('6 days, 18:31:00', 585060.0, 1.0),
               'HumidityRatio': ('6 days, 18:31:00', 585060.0, 1.0),
               'Light': ('6 days, 18:31:00', 585060.0, 1.0),
               'Temperature': ('6 days, 18:31:00', 585060.0, 1.0),
               'id': ('6 days, 18:31:00', 585060.0, 1.0)},
 'datatraining': {'CO2': ('5 days, 15:42:00', 488520.0, 1.0),
                  'Humidity': ('5 days, 15:42:00', 488520.0, 1.0),
                  'HumidityRatio': ('5 days, 15:42:00', 488520.0, 1.0),
                  'Light': ('5 days, 15:42:00', 488520.0, 1.0),
                  'Temperature': ('5 days, 15:42:00', 488520.0, 1.0),
                  'id': ('5 days, 15:42:00', 488520.0, 1.0)}}