====================
K Nearest neighbours
====================
KNN classifies data according to the majority of labels in the nearest neighbourhood, according to some underlying distance function :math:`d(x,x')`.
For :math:`k=1`, the label for a test point :math:`x^*` is predicted to be the same as for its closest training point :math:`x_{k}`, i.e. :math:`y_{k}`, where
.. math::
k=\argmin_j d(x^*, x_j).
See Chapter 14 in :cite:`barber2012bayesian` for a detailed introduction.
-------
Example
-------
Imagine we have files with training and test data. We create CDenseFeatures (here 64 bit floats aka RealFeatures) and :sgclass:`CMulticlassLabels` as
.. sgexample:: knn.sg:create_features
In order to run :sgclass:`CKNN`, we need to choose a distance, for example :sgclass:`CEuclideanDistance`, or other sub-classes of :sgclass:`CDistance`. The distance is initialized with the data we want to classify.
.. sgexample:: knn.sg:choose_distance
Once we have chosen a distance, we create an instance of the :sgclass:`CKNN` classifier, passing it :math:`k`.
.. sgexample:: knn.sg:create_instance
Then we run the train KNN algorithm and apply it to test data, which here gives :sgclass:`CMulticlassLabels`.
.. sgexample:: knn.sg:train_and_apply
We can evaluate test performance via e.g. :sgclass:`CMulticlassAccuracy`.
.. sgexample:: knn.sg:evaluate_accuracy
----------
References
----------
:wiki:`K-nearest_neighbors_algorithm`
.. bibliography:: ../../references.bib
:filter: docname in docnames