kNN Algorithm Visualization

This page shows interactive visualizations of a brute force kNN classification algorithm for a chosen set of parameters. The algorithm is developed (in Python) in this notebook. The aim is to show how the algorithm works, and the way the hyperparameters and dataset structure affect its performance.

The visualizations show the control flow of the algorithm for one sample: it calculates the distances, updates the neighbors' list, and makes a prediction. The dataset is a subset of iris, as detailed in the notebook. Feel free to play with the options. A detailed explanation of the algorithm is below.

Algorithm Visualization

Weights

Dataset

Metric

Neighbors

kNN is a lazy machine learning model: it defers its calculation to the prediction phase. The training phase consists only in keeping a copy of the training data. In the prediction phase, the algorithm takes each sample's features and loops through the training set, calculating the distance from each point to the sample, in accordance to a given metric (e.g., euclidean and manhattan metrics in these visualizations). As it calculates the distances, the algorithm creates a list of the k minimum distances (and a list of the corresponding data points). If a data point's distance to the sample is lower than the largest one in the list, then the largest one is discarded and the new data point takes its place.

After the set of k neighbors is chosen, looping through the whole training set, the prediction is made. Note that this looping through the whole training set is what makes this algorithm computationally expensive. The running time will be a function of the multiplication between the number of features and the number of samples. The prediction can be done either by a simple majority vote, or by a weighted vote (where each neighbors' vote in inversely proportional to its distance).

These visualizations were built usingMatplotlib animation, with the help of the Camera plugin.