********************* Extending the Toolkit ********************* In this section we illustrate how the user can extend ODToolkit by adding a new data set, a new occupancy estimation model, and a new evaluation metrics to the toolkit. Adding a new data set ===================== We assume that a data set is a collection of comma separated (csv) files where each file contains data pertaining to a single room within a building. The columns of each file represent different features and its rows correspond to successive timestamps. The labelled occupancy data should be the last column of the csv file (if applicable). Sample code ----------- .. code-block:: python :linenos: import core # Load the raw csv file and transform to core.data.dataset.Dataset() dataset = core.data.import_data("./data-room-1.csv", time_column_index=1, room_name="User room 1", tz=-2) # Add more rooms into the same data set dataset += core.data.import_data("./data-room-2.csv", time_column_index=1, room_name="User room 2", tz=-2) # Preprocess the data set core.preprocessing.auto_clean(dataset, target_frequency=60) # After preprocessing, put the data set into specific folder for data sets core.data.save_dataset(dataset, "./core/data/binary_dataset/user_room") Once the data set is added to the toolkit, you may ask the toolkit to load the data as follows: .. code-block:: python dataset = core.data.load_sample("user_room") Analyzing a data set and using it to evaluate a model ----------------------------------------------------- .. code-block:: python :linenos: import core # Load a user-defined data set from the package, and remove its time feature dataset = core.data.load_sample(["user_room"]) time_column_name = dataset["user_room"].feature_mapping[dataset["user_room"].time_column_index] dataset["user_room"].remove_feature(time_column_name) # Plot the correlation graph core.plot.plot_feature_correlation(dataset["user_room"]) # Use Random Forest model to perform occupancy estimation # Use all binary evaluation metrics to evaluate the model result = core.evaluation.Result() result.set_result(core.easy_set_experiment(dataset, models=["RandomForest"], evaluation_metrics="all")[0]) # Plot the scores in a bar chart core.plot.plot_result(result, dataset="user_room", threshold="<= 1") Adding a new model ================== To add a new occupancy estimation model to ODToolkit, you must put its code under the folder ``./core/model/``. Template -------- .. code-block:: python :linenos: #!/usr/bin/env python3 # -*- coding: utf-8 -*- # If you want to put your model into the ODToolkit folder, use this from .superclass import * # If you want to put your model into your own folder, use this from core.model.superclass import * # Sample for supervised-learning model class YourModelName(NormalModel): def __init__(self, name_for_train_dataset, name_for_test_dataset): # all changeable parameters self.name_for_train_dataset = name_for_train_dataset self.name_for_test_dataset = name_for_test_dataset # ... Any other parameters defines here # the model must have a method called run, and return the predicted result def run(self): # Your model goes here # Use core.data.dataset.Dataset() as data type # ... # Result must be a numpy.ndarray with shape of (num_of_rows, 1) return predict_occupancy # Sample for domain-adaptive semi-supervised learning model class YourModelName(DomainAdaptiveModel): def __init__(self, name_for_source_dataset, name_for_target_retrain_dataset, # This could be None name_for_target_test_dataset): # all changeable parameters self.name_for_source_dataset = name_for_source_dataset self.name_for_target_retrain_dataset = name_for_target_retrain_dataset self.name_for_target_test_dataset = name_for_target_test_dataset # ... Any other parameters defines here # the model must have a method called run, and return the predicted result def run(self): # Your model goes here # Use core.data.dataset.Dataset() as data type # ... # Result must be a numpy.ndarray with shape of (num_of_rows, 1) return predict_occupancy Applying and evaluating the model --------------------------------- To run experiments quickly using the default parameter setting: .. code-block:: python :linenos: import core # Load a sample data set from the package dataset = core.data.load_sample(["umons-all"]) # Use new model to perform occupancy estimation # Use all binary evaluation metrics to evaluate the model result = core.evaluation.Result() result.set_result(core.easy_set_experiment(dataset, models=["YourModelName"], evaluation_metrics="all")[0]) # Plot the scores in a bar chart core.plot.plot_result(result, dataset="umons-all", threshold="<= 1") To change the default parameters: .. code-block:: python :linenos: import core # Load a sample data set from the package dataset = core.data.load_sample("umons-all") # Create train / test sets train, test = dataset.split(0.8) # Select models you want to use model_set = core.model.NormalModel(train, test, thread_num=1) model_set.add_model(["YourModelName"]) # Edit the settings of your model your_model = model_set.models["YourModelName"] your_model.alpha = 0.5 # Run model to predict occupancy level model_set_results = model_set.run_all_model() # Evaluate the result metrics = core.evaluation.BinaryEvaluation( model_set_results["YourModelName"], test.occupancy) evaluation_score = dict() metrics.get_all_metrics() evaluation_score["YourModelName"] = metrics.run_all_metrics() # Plot the result result = core.evaluation.Result() result.set_result({"umons-all": evaluation_score}) # Plot the scores in a bar chart core.plot.plot_result(result, dataset="umons-all", threshold="<= 1") Adding a new evaluation metric ============================== The procedure to add a new metric is the same as adding a new model. Template -------- .. code-block:: python :linenos: #!/usr/bin/env python3 # -*- coding: utf-8 -*- # If you want to put your model into the ODToolkit folder, use this from .superclass import * # If you want to put your model into your own folder, use this from core.evaluation.superclass import * # If evaluate occupancy count estimation, use OccupancyEvaluation as superclass # If evaluate binary estimation, use BinaryEvaluation as superclass class YourMetricName(OccupancyEvaluation): def __init__(self, name_for_predict_result, name_for_ground_truth): # all changeable parameters self.name_for_predict_result = name_for_predict_result self.name_for_ground_truth = name_for_ground_truth # ... Any other parameters defines here # the model must have a method called run, and return the score def run(self): # Your model goes here # Use two (x, 1) numpy.ndarrays as data type # ... # Result must be one value return score Using the new metric to evaluate models --------------------------------------- To evaluate a model using the new metric: .. code-block:: python :linenos: import core # Load a sample data set from the package dataset = core.data.load_sample(["umons-all"]) # Use RandomForest model to perform occupancy estimation # Use all binary evaluation metrics to evaluate the model result = core.evaluation.Result() result.set_result(core.easy_set_experiment(dataset, models=["RandomForest"], evaluation_metrics=["YourMetricName"])[0]) # Plot the scores in a bar chart core.plot.plot_result(result, dataset="umons-all", threshold="<= 1") To change the default parameters: .. code-block:: python :linenos: import core # Load a sample data set from the package dataset = core.data.load_sample("umons-all") # Create train / test sets train, test = dataset.split(0.8) # Select models you want to use model_set = core.model.NormalModel(train, test, thread_num=1) model_set.add_model(["RandomForest"]) # Run model to predict occupancy level model_set_results = model_set.run_all_model() # Evaluate the result metrics = core.evaluation.BinaryEvaluation( model_set_results["RandomForest"], test.occupancy) evaluation_score = dict() metrics.add_metrics(["YourMetricName"]) # Change your metric your_metric = metrics.metrics["YourMetricName"] your_metric.alpha = 0.5 evaluation_score["RandomForest"] = metrics.run_all_metrics() # Plot the result result = core.evaluation.Result() result.set_result({"umons-all": evaluation_score}) # Plot the scores in a bar chart core.plot.plot_result(result, dataset="umons-all", threshold="<= 1")