Exploring the Power and Practical Applications of K-Nearest Neighbours (KNN) in Machine Learning

Venkateswarlu B; Rekha Gangula

doi:10.69996/jcai.2024002

Authors

Venkateswarlu B Assistant Professor, Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Green Fields, Vaddeswaram, Andhra Pradesh, India
Rekha Gangula Assistant Professor, Computer Science and Engineering, Vaagdevi Engineering College, Bollikunta, Warangal, Telangana, India

DOI:

https://doi.org/10.69996/jcai.2024002

Keywords:

Machine learning, k-nearest neighbours, artificial intelligence, knn, cuting edge field

Abstract

Artificial intelligence’s main component, machine learning, enables systems to learn on their own and improve performance via experience, doing away with the need for explicit programming. This cutting-edge field focuses on equipping computer programs with the ability to access vast datasets and derive intelligent decisions from them. One of the cornerstone algorithms in machine learning, the K-nearest neighbours (KNN) algorithm, is known for its simplicity and effectiveness. KNN leverages the principle of storing all available data points within its training dataset and subsequently classifying new, unclassified cases based on their similarity to the existing dataset. This proximity-based classification approach renders KNN a versatile and intuitive tool with applications spanning diverse domains. This document explores the inner workings of the K-nearest neighbours’ algorithm, its practical applications across various domains, and a comprehensive examination of its strengths and limitations. Additionally, it offers insights into practical considerations and best practices for the effective implementation of KNN, illuminating its significance in the continually evolving landscape of machine learning and artificial intelligence.

References

[1] A. Beygelzimer, K. Sham, L. John, Sunil Arya, David Mount et al., “FNN: Fast Nearest Neighbor Search Algorithms and Applications,” https://CRAN.R-project.org/package=FNN.Bruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists: 50 Essential Concepts. O’Reilly Media,Inc.pp.1-17, 2017.

[2] P. Cunningham and Sarah Jane Delany. “K-Nearest Neighbour Classifiers.” Multiple Classifier Systems, vol. 34, no.8, pp. 1-17.

[3] De. Maesschalck, Roy, Delphine Jouan-Rimbaud, and Désiré L Massart. “The Mahalanobis Distance.” Chemometrics and Intelligent Laboratory Systems, vol.50, no.1, pp. 1–18. 2000.

[4] Friedman, Jerome, Trevor Hastie, and Robert Tibshirani. “The Elements of Statistical Learning. Vol.1. Springer Series in Statistics New York,” NY, USA:Han, Jiawei, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques. Elsevier.

[5] Jiang, Shengyi, P. Guansong, Wu Meilin and Limin Kuang. “An Improved K-Nearest-Neighbor Algorithm for Text Categorization.” Expert Systems with Applications, vol. 39, no.1,2017.

[6] Mccord, Michael, and M Chuah. “Spam Detection on Twitter Using Traditional Classifiers.” In International Conference on Autonomic and Trusted Computing, pp.175–86.2011.

[7] Robinson, John T. “The Kdb-Tree: A Search Structure for Large Multidimensional Dynamic Indexes.” In Proceedings of the 1981 Acm Sigmod International Conference on Management of Data, pp. 10–18. ACM.

[8] Kubat M and Matwin S. “Addressing the curse of imbalanced training sets: one-sided selection,” ICML, 1997, pp. 179-186.

[9] C. Wang, L. Hu, M. Guo, X. Liu and Q. Zou et al. “an ensemble learning method for imbalanced classification with miRNA data,” Genetics and molecular research. vol. 14, pp.123, 2015.

[10] NB. Abdel-Hamid, S. ElGhamrawy, AE. Desouky and H. Arafat. “A Dynamic Spark-based Classification Framework for Imbalanced Big Data,’ J Grid Computing, vol. 16, no.607, 2017.