2025, Vol. 10, Issue 3, Part A
Integrating bayesian logistic regression, principal component analysis, and random forest to model cardiovascular disease risk factors
Author(s): Ezenwaka Ifenna Johnmary and Chinwendu Alice Uzuke
Abstract: The study addresses the complexity of predicting cardiovascular disease (CVD) by employing a combination of advanced statistical and machine learning techniques. The methodology integrates Bayesian logistic regression to incorporate prior knowledge and quantify uncertainty, Principal Component Analysis (PCA) to reduce data dimensionality and highlight key patterns, and Random Forest to enhance prediction accuracy and assess the importance of various risk factors. The analysis revealed that Bayesian logistic regression effectively identified significant CVD risk factors and provided robust parameter estimates. PCA successfully reduced the dataset's complexity while retaining essential information, and Random Forest demonstrated high predictive accuracy, outperforming traditional linear models. The integrated model showed superior performance in predicting CVD risk, offering deeper insights into the interactions among risk factors. Findings indicate that age, systolic blood pressure, and smoking status were among the most critical predictors of CVD. The probabilistic nature of Bayesian logistic regression enhanced model interpretability, while the ensemble learning approach of Random Forest provided a nuanced understanding of variable importance. In conclusion, this integrated approach significantly improves CVD risk prediction and offers valuable insights into the multifactorial nature of the disease. The study recommends adopting such comprehensive modeling techniques in clinical practice to enhance patient risk stratification and inform targeted prevention strategies.
DOI: 10.22271/maths.2025.v10.i3a.2004Pages: 29-38 | Views: 88 | Downloads: 5Download Full Article: Click Here
How to cite this article:
Ezenwaka Ifenna Johnmary, Chinwendu Alice Uzuke.
Integrating bayesian logistic regression, principal component analysis, and random forest to model cardiovascular disease risk factors. Int J Stat Appl Math 2025;10(3):29-38. DOI:
10.22271/maths.2025.v10.i3a.2004