The best way to conference proceedings by Francis Academic Press

Web of Proceedings - Francis Academic Press
Web of Proceedings - Francis Academic Press

Integrated Logistic Regression–XGBoost and Bayesian Network Models for Disease Prediction

Download as PDF

DOI: 10.25236/iwmecs.2025.020

Author(s)

Yihao Zhu, Ruixin Zhang, Junfeng Li

Corresponding Author

Yihao Zhu

Abstract

Cardiovascular disease, stroke and cirrhosis are diseases that pose a major health threat worldwide. This study carries out data-driven disease risk prediction and association analysis based on three disease datasets. In the first step, the stroke, heart disease and cirrhosis datasets were systematically preprocessed, including outlier processing based on the K-S test, spline function interpolation of missing values, and standardization and visual analysis. Through correlation analysis and chi-square test, the key influencing factors such as age, ST-segment depression, and albumin were identified. In the second step, a logistic regression-XGBoost integrated model was constructed to predict the prevalence probability of three types of diseases, and the model performance was evaluated through accuracy, AUC-ROC and other indicators, among which the accuracy of logistic regression for heart disease prediction reached 68%, and XGBoost's performance in the multi-classification task of liver cirrhosis needs to be improved. The results showed that the probability of heart disease-cirrhosis complications reached 83.33% in the high-risk group and 25.75% in the high-risk group, revealing a strong correlation between diseases. This study provides data support and method reference for multi-disease risk prediction and collaborative prevention and control.

Keywords

K-S test; Spline function; Logistic regression - XGBoost integration; Bayesian network; Disease prediction; Comorbidity analysis