core.preprocessing package
==========================

This package contains functions that are useful for modifying values in ``core.data.dataset.Dataset``.
:func:`core.preprocessing.auto_clean.auto_clean` automatically runs all preprocessing functions.

core.preprocessing.change module
--------------------------------

.. py:function:: core.preprocessing.change.change_to_binary(dataset)
    :noindex:
.. py:function:: core.preprocessing.change.change_to_label(dataset)
    :noindex:
.. py:function:: core.preprocessing.change.change_to_one_hot(dataset)
    :noindex:

    :func:`~core.preprocessing.change.change_to_binary`, :func:`~core.preprocessing.change.change_to_label`
    and :func:`~core.preprocessing.change.change_to_one_hot` changes the encoding method for occupancy
    in the ``core.data.dataset.Dataset`` object.:

        >>> import core
        >>> import numpy as np
        >>> data = np.array([[1, 2], [3, 4]])
        >>> occupancy = np.array([[0], [3]])
        >>> dataset = core.data.Dataset()
        >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
        >>> dataset.occupancy
        array([[0.],
               [3.]])
        >>> core.preprocessing.change_to_one_hot(dataset)
        >>> dataset.occupancy
        array([[1., 0., 0., 0.],
               [0., 0., 0., 1.]])
        >>> core.preprocessing.change_to_label(dataset)
        >>> dataset.occupancy
        array([[0],
               [3]], dtype=int64)
        >>> core.preprocessing.change_to_binary(dataset)
        >>> dataset.occupancy
        array([[0],
               [1]], dtype=int64)

core.preprocessing.fill module
------------------------------

.. py:function:: core.preprocessing.fill.fill(dataset)
    :noindex:

    :func:`~core.preprocessing.fill.fill` function helps impute all missing data
    in the ``core.data.dataset.Dataset`` object.:

        >>> import core
        >>> import numpy as np
        >>> data = np.array([[1, np.nan], [100, 4], [200, np.nan], [20, np.nan]])
        >>> occupancy = np.array([[0], [3], [4], [7]])
        >>> dataset = core.data.Dataset()
        >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
        >>> dataset.data
        array([[  1.,  nan],
               [100.,   4.],
               [200.,  nan],
               [ 20.,  nan]])
        >>> core.preprocessing.fill(dataset)
        >>> dataset.data
        array([[  1.,   4.],
               [100.,   4.],
               [200.,   4.],
               [ 20.,   4.]])

core.preprocessing.ontology module
----------------------------------

.. py:function:: core.preprocessing.ontology.ontology(dataset)
    :noindex:

    :func:`~core.preprocessing.ontology.ontology` function can be used to change the feature names
    in the ``core.data.dataset.Dataset`` object.:

        >>> import core
        >>> import numpy as np
        >>> data = np.array([[1, 1], [2, 4]])
        >>> occupancy = np.array([[0], [3]])
        >>> dataset = core.data.Dataset()
        >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["Tem.", "damp"])
        >>> core.preprocessing.ontology(dataset)
        ['temperature', 'damper']
        >>> dataset.feature_list
        ['temperature', 'damper']

core.preprocessing.outlier module
---------------------------------

.. py:function:: core.preprocessing.outlier.remove_outlier(dataset, auto_fill=True, ratio=1.5)
    :noindex:

    :func:`~core.preprocessing.outlier.remove_outlier` function removes outliers that exist in
    the ``core.data.dataset.Dataset`` object and fills in the gap that will be produced.:

        >>> import core
        >>> import numpy as np
        >>> data = np.array([[1, 1], [2, 4], [200, 4], [3, 7]])
        >>> occupancy = np.array([[0], [3], [4], [7]])
        >>> dataset = core.data.Dataset()
        >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
        >>> dataset.data
        array([[  1.,   1.],
               [  2.,   4.],
               [200.,   4.],
               [  3.,   7.]])
        >>> core.preprocessing.remove_outlier(dataset)
        >>> dataset.data
        array([[1., 1.],
               [2., 4.],
               [2., 4.],
               [3., 7.]])

core.preprocessing.upsample module
----------------------------------

.. py:function:: core.preprocessing.upsample.upsample(dataset, target_frequency, algorithm='linear')
    :noindex:
.. py:function:: core.preprocessing.downsample.downsample(dataset, target_frequency, algorithm='mean')
    :noindex:

    :func:`~core.preprocessing.upsample.upsample` and :func:`~core.preprocessing.downsample.downsample` increase and
    decrease the number of data points (i.e., the number of rows) in the ``core.data.dataset.Dataset`` object.:

        >>> import core
        >>> import numpy as np
        >>> data = np.array([[1, 2], [100, 4], [200, 6]])
        >>> occupancy = np.array([[0], [3], [4]])
        >>> dataset = core.data.Dataset()
        >>> dataset.add_room(data, occupancy=occupancy, room_name="test", header=["name 1", "name 2"])
        >>> dataset.data
        array([[  1.,   2.],
               [100.,   4.],
               [200.,   6.]])
        >>> dataset.time_column_index = 0
        >>> core.preprocessing.upsample(dataset, 10)
        >>> dataset.data
        array([[  1.        ,   2.        ],
               [ 11.        ,   2.22222222],
               [ 21.        ,   2.44444444],
               [ 31.        ,   2.66666667],
               [ 41.        ,   2.88888889],
               [ 51.        ,   3.11111111],
               [ 61.        ,   3.33333333],
               [ 71.        ,   3.55555556],
               [ 81.        ,   3.77777778],
               [ 91.        ,   4.        ],
               [101.        ,   4.2       ],
               [111.        ,   4.4       ],
               [121.        ,   4.6       ],
               [131.        ,   4.8       ],
               [141.        ,   5.        ],
               [151.        ,   5.2       ],
               [161.        ,   5.4       ],
               [171.        ,   5.6       ],
               [181.        ,   5.8       ],
               [191.        ,   6.        ]])
        >>> core.preprocessing.downsample(dataset, 60)
        >>> dataset.data
        array([[  1.        ,   2.        ],
               [ 61.        ,   3.33333333],
               [121.        ,   4.6       ],
               [181.        ,   5.8       ]])