core.stats packageΒΆ
This package contains functions for analyzing core.data.dataset.Dataset
and assessing its quality.
core.stats.analysis.analysis()
runs all sorts of analysis automatically.
-
core.stats.dropout_rate.
dropout_rate
(dataset, dataset_level=False) dropout_rate()
function computes the percentage of data points missing in thecore.data.dataset.Dataset
object.
-
core.stats.frequency.
frequency
(dataset, dataset_level=True) frequency()
function computes the average sampling frequency of all features in thecore.data.dataset.Dataset
object.
-
core.stats.gap_detect.
gap_detect
(dataset, threshold, sensor_level=False) gap_detect()
function returns all gaps in thecore.data.dataset.Dataset
object.We say that there is a Gap when the difference between timestamps of two successive values exceeds some threshold.
-
core.stats.occupancy_evaluation.
occupancy_distribution_evaluation
(dataset, dataset_level=True) occupancy_distribution_evaluation()
function computes the distribution of different occupancy states in thecore.data.dataset.Dataset
object.
-
core.stats.uptime.
uptime
(dataset, threshold, gaps=None) uptime()
function computes the length of time a sensor reported values. It differs from thedropout_rate()
in that it captures the length of the time interval in which data points were available in thecore.data.dataset.Dataset
object.
A sample code snippet for using the package is here:
>>> import core
>>> dataset = core.data.load_sample("umons-all")
>>> from pprint import pprint
>>> # ===== Test dropout rate =====
>>> pprint(core.stats.dropout_rate(dataset, dataset_level=False))
{'datatest': 0.0, 'datatest2': 0.0, 'datatraining': 0.0}
>>> pprint(core.stats.dropout_rate(dataset, dataset_level=True))
0.0
>>> # ===== Test frequency =====
>>> pprint(core.stats.frequency(dataset, dataset_level=False))
{'datatest': 60.0, 'datatest2': 60.0, 'datatraining': 60.0}
>>> pprint(core.stats.frequency(dataset, dataset_level=True))
60.0
>>> # ===== Find all gaps =====
>>> pprint(core.stats.gap_detect(dataset, 65, sensor_level=False))
{'datatest': [], 'datatest2': [], 'datatraining': []}
>>> pprint(core.stats.gap_detect(dataset, 65, sensor_level=True))
{'datatest': {'CO2': [],
'Humidity': [],
'HumidityRatio': [],
'Light': [],
'Temperature': [],
'id': []},
'datatest2': {'CO2': [],
'Humidity': [],
'HumidityRatio': [],
'Light': [],
'Temperature': [],
'id': []},
'datatraining': {'CO2': [],
'Humidity': [],
'HumidityRatio': [],
'Light': [],
'Temperature': [],
'id': []}}
>>> # ===== Evaluate occupancy distribution =====
>>> pprint(core.stats.occupancy_distribution_evaluation(dataset, dataset_level=False))
{'datatest': {0.0: (1693, 0.6352720450281426),
1.0: (972, 0.3647279549718574),
'occupied': (972, 0.3647279549718574)},
'datatest2': {0.0: (7703, 0.7898892534864643),
1.0: (2049, 0.21011074651353567),
'occupied': (2049, 0.21011074651353567)},
'datatraining': {0.0: (6414, 0.7876703917475132),
1.0: (1729, 0.2123296082524868),
'occupied': (1729, 0.2123296082524868)}}
>>> pprint(core.stats.occupancy_distribution_evaluation(dataset, dataset_level=True))
{0.0: (15810, 0.7689688715953308),
1.0: (4750, 0.23103112840466927),
'occupied': (4750, 0.23103112840466927)}
>>> # ===== Test uptime =====
>>> pprint(core.stats.uptime(dataset, 65))
{'datatest': {'CO2': ('1 day, 20:24:00', 159840.0, 1.0),
'Humidity': ('1 day, 20:24:00', 159840.0, 1.0),
'HumidityRatio': ('1 day, 20:24:00', 159840.0, 1.0),
'Light': ('1 day, 20:24:00', 159840.0, 1.0),
'Temperature': ('1 day, 20:24:00', 159840.0, 1.0),
'id': ('1 day, 20:24:00', 159840.0, 1.0)},
'datatest2': {'CO2': ('6 days, 18:31:00', 585060.0, 1.0),
'Humidity': ('6 days, 18:31:00', 585060.0, 1.0),
'HumidityRatio': ('6 days, 18:31:00', 585060.0, 1.0),
'Light': ('6 days, 18:31:00', 585060.0, 1.0),
'Temperature': ('6 days, 18:31:00', 585060.0, 1.0),
'id': ('6 days, 18:31:00', 585060.0, 1.0)},
'datatraining': {'CO2': ('5 days, 15:42:00', 488520.0, 1.0),
'Humidity': ('5 days, 15:42:00', 488520.0, 1.0),
'HumidityRatio': ('5 days, 15:42:00', 488520.0, 1.0),
'Light': ('5 days, 15:42:00', 488520.0, 1.0),
'Temperature': ('5 days, 15:42:00', 488520.0, 1.0),
'id': ('5 days, 15:42:00', 488520.0, 1.0)}}