A New Approach to Computationally- Successful Linear and Polynomial Regression Analytics of Large Data in Medicine

Authors

  • U. Srilakshmi Professor, Department of CSE, Koneru Lakshmaah Education Foundation, Bowrampet, Hyderabad, Telangana, 500043, India Author
  • J Manikandan Assistant professor, Department of CSE, CMR Institute of Technology, Kandlakoya, Hyderabad, Telangana, 501401, India Author
  • Thanmayee Velagapudi Students, Department of CSE, CMR Institute of Technology, Kandlakoya, Hyderabad, Telangana, 501401, India Author
  • Gandla Abhinav Students, Department of CSE, CMR Institute of Technology, Kandlakoya, Hyderabad, Telangana, 501401, India Author
  • Tharun Kumar Students, Department of CSE, CMR Institute of Technology, Kandlakoya, Hyderabad, Telangana, 501401, India Author
  • Dogiparthy Saideep Students, Department of CSE, CMR Institute of Technology, Kandlakoya, Hyderabad, Telangana, 501401, India Author

DOI:

https://doi.org/10.69996/jcai.2024009

Keywords:

Healthcare analytics, linear regression, polynomial regression, Optimization, Predictive modeling, big data

Abstract

In the realm of healthcare, predictive modeling stands as a pivotal tool for deciphering patient outcomes and refining medical decision-making processes. However, the accuracy of machine learning algorithms, which underpin these predictive models, often falls short, leading to erroneous predictions. This study offers a new approach to optimize linear and polynomial regression models for healthcare analytics, which aims to tackle this challenge. In contrast to earlier efforts, this method focuses on using a scaled-down data transformation to improve linear regression model performance. The main goal of this study is to reduce the sum of squared errors (SSE) and improve the predictive power of linear regression models by using a data transformation function to reduce the size of all variables. In a series of experiments, we used non-Bayesian statistics in SPSS and Matlab to generate 40 trials of linear regression models, with 1,000 observations in each trial. In addition, we used SPSS for regression analysis, Excel for data manipulation, Wilcoxon signed-rank tests, and Cronbach’s alpha statistics for optimization model performance evaluation.Our findings show that the suggested scale-down transformation method is effective, since the sum of squared errors is significantly reduced (absolute Z-score=5.511, effect size=0.779, p-value<0.001, Wilcoxon signed-rank test). Furthermore, the optimized model's robust internal consistency was confirmed by inter-item reliability testing (Cronbach's alpha=0.993)

References

[1] J. Berro, “Essentially, all models are wrong, but some are useful—a cross-disciplinary agenda for building useful models in cell biology and biophysics,” Biophysical Reviews, vol. 10, no. 6, pp.1637- 1647, 2018.

[2] G. E. Box, “Science and statistics,” Journal of the American Statistical Association, vol. 71, no.356, pp. 791-799, 1976.

[3] A. I. Imam, Ahmed, “Monitoring and Analysis of Novel Psychoactive Substances in Trends Databases, Surface Web and the Deep Web, with Special Interest and Geo-Mapping of the Middle East,” Hertfordshire, United Kingdom.

[4] M. Chevreuil, R. Lebrun, A. Nouy and P. Rai, “A least- squares method for sparse low rank approximation of multivariate functions,” SIAM/ASA Journal on Uncertainty Quantification, vol.3, no. 1, pp. 897-921, 2015.

[5] J. Cohen, “Statistical power analysis,” Current Directions in Psychological Science, vol. 1, no. 3,pp. 98-101, 1992.

[6] R. M. Dawes and B. Corrigan, “Linear models in decision making,” Psychologic Bulletin, vol.81, no. 2, pp. 95, 1974.

[7] A. W. F. Edwards, “Mathematizing Darwin,” Behavioral Ecology and Sociobiology, vol. 65, no.3, pp. 421-430, 2011.

[8] B. Efron, “RA Fisher in the 21st century,” Statistical Science, vol. 13, no. 2, pp. 95-114, 1998.

[9] T. Everitt, B. Goertzel, and A. Potapov, “Artificial general intelligence,” Lecture Notes in Artificial Intelligence, Heidelberg:Springer, 2017.

[10] K.M. Fedak, A. Bernal, Z.A. Capshaw and S. Gross, “Applying the Bradford Hill criteria in the 21st century: how data integration has changed causal inference in molecular epidemiology,”Emerging Themes in Epidemiology, vol. 12, no. 1, pp. 14, 2015.

[11] E.H. Field, “All models are wrong, but some are useful,” Seismological Research Letters, vol. 76, no. 2A, pp. 291-293, 2015.

[12] D.A. Freedman, “Bootstrapping regression models,” The Annals of Statistics, vol. 9, no. 6, pp. 1218-1228, 1981.

[13] K. Godfrey, “Simple linear regression in medical research,” Medical Uses of Statistics, NEJM Books, Boston, 1992.

[14] T. Greenhalgh, J. Howick, and N. Maskrey, “Evidence based medicine: a movement in crisis,” The British Medical Journal, vol. 348, 2014.

[15] J.E. Grizzle, C.F. Starmer, and G.G. Koch, “Analysis of categorical data by linear models,” Biometrics, pp. 489-504, 1969.

[16] L. M. Hlavac, D. Krajcarz, I.M. Hlavacova, and S. Spadlo, “Precision comparison of analytical and statistical- regression models for AWJ cutting,” Precision Engineering, vol. 50, pp. 148-159,2017.

[17] M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255-260, 2015.

[18] M. W. Lorenz, N. A. Abdi, F. Scheckenbach, A. Pflug, A. Bulbul et al., “Automatic identification of variables in epidemiological datasets using logic regression,” BMC Medical Informatics and Decision Making, vol. 17, no. 1, pp. 1-11, 2017.

[19] A. Menotti, P. E. Puddu and M. Lanti, “The estimate of cardiovascular risk,” Theory, tools and problems. Annali Italiani Di Medicina Interna: Organo Ufficiale Della Societã Italiana Di Medicina Interna, vol. 17, no. 2, pp. 81-94, 2002.

[20] M. A. Motyka and A. Al-Imam, “Musical preference and drug use among youth: an empirical study,” Research and Advances in Psychiatry, vol. 6, no. 2, pp. 50-57, 2019.

[21] B. J. Norton, “Karl Pearson and statistics: The social origins of scientific innovation,” Social Studies of Science, vol. 8, no. 1, pp. 3-34, 1978.

[22] R. B. O’hara and D. J. Kotze, “Do not log-transform count data,” Methods in Ecology and Evolution, vol. 1, no. 2, pp. 118-122, 2010.

[23] C. V. Phillips and K. J. Goodman, “The missed lessons of sir Austin Bradford Hill,” Epidemiologic Perspectives & Innovations, vol. 1, no. 1, pp. 1-5, 2004.

[24] K. J. Rothman, S. Greenland and T. L. Lash, “Modern Epidemiology,” Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins, 2008.

[25] S. Schneider, “Science Fiction and Philosophy: From Time Travel to Superintelligence,” John Wiley & Sons, 2016.

[26] P. Sedgwick, “Pearson’s correlation coefficient,” The British Medical Journal, vol. 345, pp. e4483, 2012.

[27] D. Himaja, V. Dondeti, S. Uppalapati et al., “Cluster based active learning for classification of evolving streams,” Evol. Intel, 2023.

[28] P. V. Lal, U. Srilakshmi and D. Venkateswarlu, “MHA_VGG19: Multi-Head Attention with VGG19 Backbone Classifier-based Face Recognition for Real-Time Security Applications,” International Journal of Intelligent Systems and Applications in Engineering, vol. 10, no. 1s, pp.34–44, 2022.

Downloads

Published

2024-04-30

How to Cite

U. Srilakshmi, J Manikandan, Thanmayee Velagapudi, Gandla Abhinav, Tharun Kumar, & Dogiparthy Saideep. (2024). A New Approach to Computationally- Successful Linear and Polynomial Regression Analytics of Large Data in Medicine. Journal of Computer Allied Intelligence(JCAI, ISSN: 2584-2676), 2(2), 35-48. https://doi.org/10.69996/jcai.2024009