Back to Journals » Infection and Drug Resistance » Volume 18
Risk Factors and Prognosis Analyses of Hospital-Acquired Pneumonia in Elderly Critically Ill Patients with Acute Ischemic Stroke Based on Machine Learning
Authors Jiao Q, Liu X, Chen H, Hu Z, Jiao S
, Sun Z, Lu C, Huang L, Du W, Jiao D
Received 12 March 2025
Accepted for publication 3 October 2025
Published 14 October 2025 Volume 2025:18 Pages 5323—5342
DOI https://doi.org/10.2147/IDR.S527856
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 2
Editor who approved publication: Professor Chi H. Lee
Qingxin Jiao,1 Xingyu Liu,2 Huimin Chen,3,4 Ziqi Hu,5 Shengyuan Jiao,6 Zhongyang Sun,7,8 Conglan Lu,7 Limin Huang,3,9 Wenxiu Du,10 Dongsheng Jiao5
1Department of Research, Xi’an Medical University, Xi’an, Shaanxi, 710021, People’s Republic of China; 2Department of General Medicine, Central Medical Branch of PLA General Hospital, Beijing, 100120, People’s Republic of China; 3Department of Emergency Medicine, Nanjing Pukou People’s Hospital, Nanjing, Jiangsu, 211800, People’s Republic of China; 4Department of Emergency, the Second Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, Jiangsu, 210000, People’s Republic of China; 5Department of Neurology, Air Force Hospital of Eastern Theater Command, Nanjing, Jiangsu, 210002, People’s Republic of China; 6Department of Radiation Medical Protection, School of Military Preventive Medicine, Air Force Medical University, Xi’an, Shaanxi, 710032, People’s Republic of China; 7Department of Orthopedics, Air Force Hospital of Eastern Theater Command, Nanjing, Jiangsu, 210002, People’s Republic of China; 8Department of Orthopedics, Affiliated Jinling Hospital, Medical School of Nanjing University, Nanjing, People’s Republic of China; 9Department of Emergency Medicine, Sir Run Run Hospital, Nanjing Medical University, Nanjing, Jiangsu, 211166, People’s Republic of China; 10Department of Emergency Medicine, Women’s Hospital of Nanjing Medical University (Nanjing Women and Children’s Healthcare Hospital), Nanjing, Jiangsu, 210000, People’s Republic of China
Correspondence: Wenxiu Du, Department of Emergency Medicine, Women’s Hospital of Nanjing Medical University (Nanjing Women and Children’s Healthcare Hospital), Nanjing, Jiangsu, 210000, People’s Republic of China, Email [email protected] Dongsheng Jiao, Department of Neurology, Air Force Hospital of Eastern Theater, Nanjing, Jiangsu, 210002, People’s Republic of China, Email [email protected]
Objective: Increased post-stroke sympathetic drive is linked to hospital-acquired pneumonia (HAP). This study investigated the incidence, prognosis, and risk factors of HAP in elderly critically ill acute ischemic stroke (AIS) patients.
Methods: We analyzed HAP risk factors and prognosis in critically ill AIS patients (aged > 50, NIHSS > 15) from the First Affiliated Hospital of Xi’an Medical University (September 2023–February 2024). Nine factors from 19 variables were selected, with 11 machine learning algorithms for HAP risk prediction. Kaplan–Meier survival estimate, Cox proportional hazards model, 10-fold cross-validation, Friedman and post-hoc Nemenyi tests were used for prognosis analysis and algorithm selection. SHapley Additive explanation values explained feature weights.
Results: Of 785 patients, 215 (27.39%) developed HAP, 40.38% were > 80 years, 67.01% male, with 30.68% overall mortality. Key predictive variables included respiratory failure, hospital stays, consecutive febrile days, number of bacteria, antibiotics, CRP, immunopotentiator, blood transfusion, and ICU admission. XGBoost performed best (AUC: 0.995 [0.995– 0.996] training sets, 0.898 [0.891– 0.905] validation sets). HAP, respiratory failure, number of bacteria and ICU admission deteriorated survival, longer hospital stays improved prognosis. Top 3 features via SHAP were number of bacteria, ICU admission and consecutive febrile days.
Conclusion: Elderly critically ill AIS patients with HAP are more prone to respiratory failure, prolonged fever, blood transfusion, ICU admission, or death. The number of bacteria-positive species and elevated CRP levels (≥ 5 mg/L) were identified as among the most significant predictors associated with the development of HAP in our model. Administration of antibiotics and immunopotentiators was significantly associated with improved prognosis in our cohort. However, further interventional studies are required to confirm a causal therapeutic benefit.
Keywords: hospital-acquired pneumonia, acute ischemic stroke, machine learning, risk factor, elderly adults
Introduction
Infection is a common complication in patients with acute ischemic stroke (AIS) and is an independent risk factor for early recurrence of stroke during hospitalization. In the CNSR-III (Third China National Stroke Registry) the incidence of infection in patients with AIS during hospitalization was 6.5%.1 Simultaneously, prior research has indicated that on account of immunosuppression, dysphagia, neural dysfunction, and other factors, the early infection rate of stroke can reach as high as 10% to 30%.2,3 In addition, 20.7% of AIS patients were aged 80 years or older in a study in the United States spanned from 2007 to 2019, the NIHSS (National Institute of Health stroke scale) and the occurrence of comorbidities increased with age especially after the 50 years old.4 Elderly patients are at greater risk of AIS, with more severe symptoms and a higher number of complications, and the prognosis for the elderly is also not optimistic. Hospital-acquired pneumonia (HAP) is the main cause of morbidity and mortality in patients with AIS. Recent study has shown that 14.8% of AIS patients developed HAP, older and more severe stroke on admission (higher NIHSS score) were primary important risk factors.5 However, the epidemiological and clinical characteristics of HAP in elderly AIS patients remain uncertain.
In order to capture all the characteristics regarding the predisposition of elderly AIS patients to develop HAP, continuous surveillance, and reliable evaluative or predictive tools are essential. Most researches reveal that traditional risk factor such as diabetes, hypertension, and invasive medical procedures are still the commonest risk factors.6 AIS is predominantly a disease of the middle age and the elderly, so after adjusting the age and stroke severity, the characteristics of risk factors differed from previous studies. Other risk factors specific to the elderly include renal inadequacy, repeated hospitalizations, hypotrophic state, and immunosenescence, may be an unexpected contribution to HAP in critically ill AIS patients. What’s more, combined with analysis of machine learning (ML) algorithm can overcome traditional logistic regression limits, and offer accurate and interpretable risk estimates.
Effective prevention strategies of HAP in elderly patients with critically ill AIS require comprehensive information on possible causes. In this study, various ML algorithms are used to screen typical representative risk factors, and a tool for predicting HAP occurrence is constructed by the optimal algorithm. Consequently, this study endeavors to ascertain the prevalence of AIS, prognosis, and risk factors within this particular population.
Methods
Study Population
Patients with AIS who were admitted to the First Affiliated Hospital of Xi’an Medical University from September 2023 to February 2024 were collected non-selectively (n=3147). Subsequently, a highly selective set of criteria was applied to define a specific, high-risk sub-cohort for analysis. Inclusion criteria are as follows: 1) age≥50 years; 2) in accordance with the diagnostic criteria of the 2023 edition of the Chinese Guidelines for the Diagnosis and Treatment of AIS,7 and the diagnosis was confirmed by head CT or MRI; 3) time from onset to treatment ≤24 hours and was the first stroke; 4) NIHSS ≥15; 5) pneumonia neither present nor in the incubation period at the time of admission, and inflammation of the lung parenchyma caused by pathogens such as bacteria, fungi, mycoplasma, viruses, or protozoa occurred 48 hours after admission;8 6) clinical data were complete with no deleted values. Exclusion criteria are as follows: 1) specific time of onset is unclear; 2) the first admission event was not AIS or multiple recurrent episodes; 3) cerebral hemorrhage, subarachnoid hemorrhage, and transient ischemic attack, silent cerebral infarction with no symptoms and physical signs, cerebrovascular events caused by other causes (such as brain trauma, brain tumor, connective tissue disease, artery dissection, etc). This study has been reviewed by the Biomedical Ethics Committee of Xi’an Medical University, and enrolled patients have informed consent.
Data Acquisition and Outcomes
Patients were diagnosed with HAP based on criteria for hospital-acquired infection (HAI) according to the discharge medical records. Baseline information included patient demographics (age, gender), medical histories (renal inadequacy, respiratory failure, surgery, blood transfusion, ICU admission, hospital stays, and consecutive febrile days), laboratory tests (number of positive bacteria, WBC, neutrophils, HB, PCT, and CRP), and therapeutic methods (antibiotics, NSAID, steroid, immunopotentiator, and ventilator). Immunopotentiator refers to pharmacological agents that enhance immune function, including thymus preparations, immunoglobulin, and interleukin in this study. These were administered per institutional protocols for immunocompromised critical patients. The “number of positive bacteria” refers to the number of distinct bacterial species identified from clinical cultures (eg, sputum, blood) taken during the hospitalization period prior to the HAP diagnosis. Primary outcome was whether HAP occurred, defined as new onset of pneumonia or pulmonary infection caused by pathogens occurred 48 hours after admission and neither already existed nor in the incubation period at the time of admission. Prognostic outcome was all-cause death. Short-term and long-term outcomes were defined as during hospitalization and the six-month follow-up period after stroke onset, respectively.
Statistical Analysis
Baseline Profile
Patients with missing data were removed from the cohort, and nineteen baseline clinical and demographic variables were initially incorporated. Continuous variables were described by using median and interquartile range (IQR), and percentages were used to describe the category variables. Population was randomly assigned to the training or validation set using a 7:3 ratio. Chi-square test or Fisher’s exact test was used to observe the distributional difference of each variable in the training and validation sets.
Variable Selection
Overall, excessive number of variables can readily cause overfitting or variable redundancy and resulting in skewed prediction performance. First, univariate and multivariate logistic regression analyses were used to identify independent risk factors of developing HAP in elderly patients with critically ill AIS in overall cohort. Then, combined with other six variable screening methods, including the least absolute shrinkage and selection operator (Lasso) binary logistic regression, random forest, support vector machines (SVM), ridge regression, and elastic net regression. Third, the intersection of the optimal variables selected by above six methods is taken as the interested features and passed to the ML. EVenn9 was used to visualize the intersection of six algorithms for selected variables.
Machine Learning Modeling and Comparison of Prediction Performance
Eleven ML algorithms were used to incorporate optimal variable set for model construction, including eXtreme Gradient Boosting (XGboost), Light Gradient Boosting Machine (LightGBM), Multi-layer Perceptron, Gradient Boosting, k-nearest neighbor (KNN), SVM, Adaptive Boosting (AdaBoost), Logistic Regression, Bernoulli Naive Bayes (BernoulliNB), Gaussian Naive Bayes (GaussianNB), and Stochastic Gradient Descent. Prediction performance of 11 algorithms was evaluated by calculating the evaluation indexes such as the receiver operating characteristic curves (ROC) and area under curve (AUC), accuracy, precision, F1-Score, recall, Matthews Correlation Coefficient (MCC), Brier score, and Kappa value. Resampling technology was used to build ML prediction model, which is the most common method for selecting the best ML algorithm at present. For performance comparison of different machine learning classifiers, we implemented a 7:3 random split, allocating 70% of patients to the training set and 30% to the validation set. The resampling times are 20 times, and the average of these 20 results was used to compare the performance. AUC was used as an index to comprehensively evaluate sensitivity and specificity of model, so the ML classifier with the most statistically significant difference and the largest AUC value was selected as the best model by using Friedman non-parametric repeated measures analysis of variance (Friedman) test and post-hoc Nemenyi test. For 10-fold cross-validation, 40% of participants were randomly allocated as a hold-out testing set for final evaluation initially, while the remaining 60% constituted the development set. Within this development subset, we implemented a 10-fold cross-validation procedure wherein each fold further partitioned the data into approximately 70% training (≈42% of total data) and 30% validation (≈18% of total data). Ultimately, the final model was retrained on the complete development set (60% of total data) prior to unbiased evaluation on the untouched test set. The optimal model is trained and tested repeatedly by 10-fold cross-validation to obtain the optimality of the model. Decision curve analysis (DCA) and calibration plots were used to verify the performance and practicality of the top 5 model, which can illustrate the clinical application value of the model. Learning curve indicated the analysis results of how the training and the validation sets tend to achieve the highest performance in the best predictive model.
Survival Analysis
All-cause mortality was selected as the survival outcome, and risk factors were the optimal variables screened above. Univariate survival analysis was performed using Kaplan–Meier survival estimate, and multivariate survival analysis was performed using Cox proportional hazards regression model by the survival and survminer packages in R software. Log-rank P value less than 0.05 was considered statistically significant.
Model Interpretation and Predictive Model Construction
SHapley Additive explanation (SHAP) was used to interpret and visualize ML models. SHAP is a framework theory based on the additive feature attribution method, first proposed by Lloyd Shapley.10 Intuitively speaking, the contribution of each feature to the outcome can be explained by estimating the SHAP value of each feature, the larger the SHAP value is, the greater the probability of the outcome occurring. To determine the importance of each feature to the model, we constructed SHAP force plot, summary plot, and dependent plot based on the XGBoost model, and they can reflect the importance of the particular feature and the contribution or dependence of each feature to the positive and negative prediction of the sample.
Bilateral P value less than 0.05 was considered statistically significant. R (Version 4.3.1, http://www.r-project.org/) and Python (Version 3.8.0, Python Software Foundation, Wilmington, DE, USA, https://www.python.org/) software were implemented for ML modeling and statistical analysis, using Scikit-learn package as the primary ML package.
Results
Population Characteristics and Risk Factors Analysis
A total of 785 patients over 50 years with critically ill AIS were included in the cohort (Figure 1). The infection rate of HAP was 27.39% (n=215) and all-cause mortality rate was 30.36% (n=238). Among all participants, 40.38% (n=317) were over 80 years old and 67.01% (n=526) were male (Table 1). Moreover, 226 (28.79%) patients had renal insufficiency history, 349 (44.46%) patients developed respiratory failure and 231 (29.43%) patients transferred to the Intensive Care Unit (ICU) for treatment during hospitalization.
|
Table 1 Baseline Information of Hospital-Acquired Pneumonia in Patients Over 50 Years with Critically Ill Acute Ischemic Stroke in the Training and Validation Sets |
|
Figure 1 Flowchart of the patient’s enrollment and study design. |
Univariate and multivariate logistic regression analyses suggested that renal inadequacy, respiratory failure, hospital stays, consecutive febrile days, number of positive bacteria, antibiotics, neutrophils, procalcitonin (PCT), C-reactive protein (CRP), steroid, immunopotentiator, blood transfusion, and ICU admission were independent risk factors for the development of HAP (Figure 2). Interestingly, the odds of HAP development increased 2.43-fold when hospital stays over 14 days (P=0.04), and after 28 days it increased up to 5.13-fold (P<0.001), and consecutive febrile days over 7 days increased the infection risk up to 3.3 times (95% CI 1.56–7.01, P=0.002). If the number of bacteria detected exceeded 1, the odds of infection risk rapidly increased 8.1 times (95% CI 4.46–14.72, P<0.001), when the number exceeded 2 or 3, the growth trend was slowed by 3.21 times (95% CI 1.49–6.90, P=0.003) and 2.56 times (95% CI 1.15–5.68, P=0.021), respectively. In addition, neutrophils <50% (95% CI 1.16–11.55, P=0.027), elevated CRP levels (5–100 mg/L, 95% CI 1.25–8.06, P=0.015 and ≥100 mg/L, 95% CI 1.02–8.75, P=0.046) were strongly associated with increased odds of HAP, consistent with its role as a marker of systemic inflammation and infection. Blood transfusion (95% CI 2.10–6.37, P<0.001) and ICU admission (95% CI 2.55–7.19, P<0.001) also elevated the odds of HAP risk in critically ill AIS population. The administration of antibiotics was significantly associated with a reduced risk of HAP development (OR=0.27 [0.14–0.52], P<0.001), steroid (OR=0.49 [0.29–0.81], P=0.006), immunopotentiator (patients who have been treated with one or more of the following three medications: thymus preparations, immunoglobulin, and interleukin) (OR=0.31 [0.18–0.54], P<0.001), and PCT among 0.05–2μg/L (0.43 [0.22–0.82], P=0.011) or ≥2μg/L (0.30 [0.13–0.72], P=0.007).
On the contrary, antibiotics (95% CI 0.14–0.52, P<0.001), steroid (95% CI 0.29–0.81, P=0.006), immunopotentiator (patients who have been treated with one or more of the following three medications: thymus preparations, immunoglobulin, and interleukin) (95% CI 0.18–0.54, P<0.001), and PCT among 0.05–2μg/L (95% CI 0.22–0.82, P=0.011) or ≥2μg/L (95% CI 0.13–0.72, P=0.007) reduced the risk of infection by 0.27, 0.49, 0.31, 0.43, and 0.30 times, respectively.
Optimal Variables Selection
Six algorithms of screening variables were applied to perform the feature filtering. Results suggested that from the total 19 variables, lasso regression screened out 19 variables (Figure 3A), SVM screened out 17 variables (Figure 3B), ridge regression screened out 19 variables (Figure 3C), random forest (Figure 3D) and elastic net regression all screened out 13 variables (Figure 3E). To avoid bias from over-reliance on a single variable selection method, we used 6 complementary algorithms covering linear (lasso regression, ridge regression, logistic regression, SVM) and non-linear (random forest, elastic net regression) frameworks, then applied a strict “intersection criterion” to retain variables identified as important by all. This filtered 19 initial variables down to 9, ensuring they are not spurious or redundant, reducing overfitting risk and enhancing model reliability. Furthermore, 13 independent risk factors identified by logistic regression were also included as valuable features (Figure 2). To identify the most robust and high-confidence predictors for model building, we took the intersection of features selected by all six algorithms. This stringent criterion ensures that the final nine variables (respiratory failure, hospital stays, consecutive febrile days, number of positive bacteria, antibiotics, CRP, immunopotentiator, blood transfusion, and ICU admission) were independently identified as critically important by every single method, significantly reducing the chance of including redundant or noisy features that could lead to model overfitting (Figure 3F and G). Each selected variable directly links to the pathophysiology of HAP in critically ill elderly AIS patients, with clear clinical plausibility. Respiratory failure, caused by AIS-induced dysphagia, impaired cough reflex or brain edema that disrupts airway clearance, is a direct driver of HAP. Longer hospital stays increase exposure to nosocomial pathogens, immobilization-related atelectasis, and cross-contamination risk. Consecutive febrile days often signal underlying infection, which can progress to HAP if prolonged. More positive bacteria species reflect broader respiratory colonization, a precursor to HAP. Antibiotics and immunopotentiators are modifiable factors, targeting infections and improving AIS-induced immunosuppression, respectively. Elevated CRP indicates subclinical infection or inflammation, an early HAP signal. Blood transfusion is linked to transient immunosuppression and higher infection risk; ICU admission means more severe illness, more invasive device exposure, and longer immobilization. The 9 variables are routinely collected in clinical practice, documented in standard electronic health records within 48 hours of AIS admission without extra tests or equipment, enabling easy integration into bedside workflows for early HAP warning without burdening clinicians. Our variable selection advances prior studies by ensuring robustness via 6-algorithm intersection, prioritizing modifiable factors for both risk stratification and intervention, and suiting resource-limited settings by excluding rare or expensive biomarkers.
Machine Learning Modeling and Performance Comparison
Eleven different ML classifiers were utilized to construct a prediction model for the risk of developing HAP in elderly subjects with critically ill AIS. After hyperparameter optimization, the 11 algorithms were modeled in the training set and validation set, respectively, and the evaluating indexes including AUC, accuracy, precision, F1-score, recall, MCC, Brier score, and kappa were calculated. Sorting by AUC value and analysis of differences between groups using Friedman test and post-hoc Nemenyi test. Friedman test revealed that the differences among AUC values of 11 algorithms was significant (P<0.0001) both in training and validation sets. By comparing the Average Rank Difference (ARD) and critical difference (CD) between the two algorithms, if the ARD > CD, a significant performance difference existed between the two algorithms, as indicated by whether the vertical dotted line intersected ARD. Post-hoc Nemenyi test suggested that CD was 3.3758, and differences between two algorithms were statistically significant both in training (Figure 4A) and validation sets (Figure 4B). According to the order of AUC value, the ML classifier with the best prediction performance was XGBoost model, followed by LightGBM, Multi-layer perceptron, gradient boosting, and KNN in the training set (Table 2 and Figure 4C). In the validation set, the ML classifier with the best prediction model performance was also the XGBoost model, followed by gradient boosting, SVM, logistic regression, and LightGBM (Table 2 and Figure 4D). On the basis of the analysis of training and validation sets, we believed that XGBoost model had the best prediction performance among all ML classifiers. Moreover, the accuracy, precision, F1-score, recall, MCC, Brier score, and kappa of XGBoost model also ranked first in the training set, and except for recall which ranked fourth, the other evaluating indicators also ranked first in the validation set (Table 2).
|
Table 2 Performance Comparison of Different Machine Learning Classifiers in Predicting Hospital-Acquired Pneumonia in Patients Over 50 Years with Acute Ischemic Stroke |
Predictive model was constructed using the XGBoost algorithm, and the AUC values of the model for XGBoost were 0.995 (95% CI 0.995–0.996) in the training set and 0.898 (95% CI 0.891–0.905) in the validation set, results showed that the model constructed by 9 selected variables and XGBoost algorithm had good prediction accuracy (Table 2). DCA results suggested that XGboost model was helpful to predict the risk of HAP in elderly critically ill AIS subjects among the top 5 algorithms (Figure 4E). Calibration curve showed the model had good discernment (Figure 4F).
Ultimately, the best ML model in predicting HAP risk of elderly critically ill AIS patients was XGBoost. In order to further improve the predictive performance and accuracy of the model, we used 10-fold cross-validation to train and test the XGBoost model. After 10-fold cross-validation, the AUC of XGBoost model in the training, validation, and testing set were 0.906 (95% CI 0.842–0.970), 0.893 (95% CI 0.855–0.930), and 0.848 (95% CI 0.787–0.909), respectively (Table 3). In the specific optimization process of the model, when the learning results of the training set and validation set were consistent, the prediction performance of the XGBoost model was the best (Figure 4G).
|
Table 3 Predictive Performance of the Optimal Machine Learning Model for Patients Over 50 Years with Acute Ischemic Stroke After 10-Fold Cross-Validation |
Survival Analysis
All-cause death was taken as the prognostic outcome, and the influence of the screened variables and HAP on the prognosis of patients was observed. Kaplan–Meier analyses showed higher survival rate between non-HAP and HAP groups (P=0.01, Figure 5A). What’s more, patients with respiratory failure (P=0.01, Figure 5B), shorter hospital stays (P<0.0001, Figure 5C), more positive bacteria were detected (P=0.0013, Figure 5D), higher CRP (P<0.0001, Figure 5E) and ICU admission (P<0.0001, Figure 5F) had worse survival rate. Cox regression analysis was performed to assess whether the selected characteristic variables could be used as independent prognostic factors. Cox regression analysis showed that after adjusting for other confounding factors, HAP (95% CI 1.269–2.350, P<0.001), respiratory failure (95% CI 3.044–5.910, P<0.001), one (95% CI 1.227–2.600, P=0.002), two (95% CI 1.493–3.570, P<0.001), three (95% CI 2.149 4.980, P<0.001) or more positive bacteria was detected, ICU admission (95% CI 1.734–3.220, P<0.001) and hospital stays over 7 days (95% CI 0.275–0.580, P<0.001), 14 days (95% CI 0.108–0.240, P<0.001), 28 days (95% CI 0.057–0.160, P<0.001) were independent prognostic factors for elderly patients with critically ill AIS (Figure 5G).
Interpretation of Optimal Machine Learning Model by SHAP
SHAP explains the model’s predictions by calculating the contribution of each feature to the prediction target by utilizing SHAP value. We provided two typical examples, one predicted no HAP and the other predicted the development of HAP (Figure 6A), to demonstrate the interpretability of the model. The results of summary plot indicated that the greater the SHAP value of number of positive bacteria (more), ICU admission (yes), consecutive febrile days (longer), respiratory failure (yes), hospital stays (longer), blood transfusion (yes) and CRP (higher), the greater the risk of HAP after hospitalization of elderly patients with critically ill AIS. In contrast, the greater the SHAP value of antibiotics (yes) and immunopotentiator (yes), the lower the risk of HAP after hospitalization (Figure 6B).
To observe the interaction of selected variables on the risk of HAP, SHAP dependence plot was deployed. It’s noteworthy that the longer the patient’s hospital stay (Figure 6C) or the continuous fever days (Figure 6D), the higher the likelihood of multiple bacterial infections, leading to a greater risk of developing HAP. Furthermore, elderly critically ill AIS patients who developed respiratory failure (Figure 6E) or were transferred to the ICU (Figure 6F) also had a higher number of positive bacteria, and thus a higher probability of HAP. Namely, whether or not bacteria were detected and the number of positive bacteria were the primary important risk factors in the diagnosis of HAP among all the screened optimal variables.
Discussion
According to the published research, this is the first machine learning-based study investigating the associations between risk factors, incidence of HAP and all-cause mortality in critically ill elderly AIS patients. In this retrospective study, we compared 11 ML algorithms for comprehensive HAP prediction in elderly adults over 50 years with critically ill AIS, and identified a clinically feasible HAP risk-prediction model. We selected 11 ML algorithms covering four major classes to address our dataset’s unique characteristics and ensure robust performance comparison. Tree-based ensemble methods (XGBoost, LightGBM, Gradient Boosting) excel at capturing non-linear interactions and handling mixed data types. XGBoost was prioritized for its built-in regularization to mitigate overfitting. The linear models (Logistic Regression, SVM) served as baseline benchmarks, and their simplicity and interpretability helped validate whether complex algorithms like XGBoost delivered meaningful performance gains rather than overfitting to noise. Instance-based methods (KNN) was selected to test performance on “local” data patterns, as its distance-based logic can identify subtle similarities between high-risk cases. Neural networks (Multi-layer Perceptron, MLP) could capture high-dimensional feature interactions, though we noted MLP requires more data. This algorithm diversity prevented bias toward a single class and enabled us to identify the model best suited for clinical use, where both predictive accuracy and stability are essential. Among tested models, XGBoost exhibited the best performance. To avoid bias from over-reliance on a single variable selection method, we combined 6 complementary approaches—multivariate logistic regression, LASSO regression, random forest, SVM, ridge regression and elastic net regression—and retained 9 variables identified by all methods. This strategy balanced model complexity and generalizability by limiting variables and ensured clinical utility. The 9 features are routinely collected within 48 hours of admission, enabling bedside early warning. SHAP analysis quantified the impact of key features (respiratory failure, hospital stays, consecutive febrile days, number of positive bacteria, antibiotics, CRP, immunopotentiator, blood transfusion, and ICU admission). While XGBoost outperformed other algorithms, its “black box” nature may limit clinical adoption, as clinicians need transparency on why a patient is flagged as high-risk. SHAP addressed this by providing global feature importance, confirming “number of bacteria”, “ICU admission”, and “consecutive febrile days” as top predictors, which aligns with clinical intuition that infections and critical care settings drive HAP, and enabling local interpretability, for individual patients, SHAP values help clinicians target interventions. This interpretability also validated the model’s biological plausibility, as SHAP’s feature ranking matches known HAP pathophysiology, reducing the risk of spurious correlations driving performance. We also demonstrated the effect of screened variables on survival in this population. Current studies have focused on HAP prediction in AIS patients,11–13 but a single analytical algorithm introduces bias in risk prediction for critically ill elderly AIS patients due to HAP heterogeneity.14 Multiple ML algorithms reduce overfitting risk, adapt to diverse data distributions, and enhance interpretability (vs single-algorithm limitations). Among 11 algorithms, XGBoost achieved the highest AUC in both training and validation sets, with superior accuracy, precision, and clinical net benefit. To balance model generalizability, complexity, and computational cost,15 we selected 9 key variables (from 19 initial variables) via the intersection of 6 methods. These demographic or clinical features are routinely collected during care, enabling early warning. This supports the model’s utility for early HAP identification in critically ill elderly AIS patients, even those with mild or atypical symptoms not meeting pneumonia criteria. SHAP further assisted in quantifying feature contributions in the optimal model. Healthcare-associated infections (HAI) are major causes of morbidity and mortality in AIS patients.4 Pathogenesis of HAP in AIS patient is complex, such as post-stroke impaired consciousness and dysphagia increases aspiration risk,16 AIS reduces immune function17 and stroke disrupts cough reflex innervation, impairing respiratory secretion clearance and facilitating bacterial growth.18 Stroke severity and age are important confounding factors. Age-related changes in AIS patients such as stroke‑induced innate immune responses, immunosenescence, vascular aging, co‑morbidities,19 impair tissue function via microcirculation and immune dysfunction, increasing infection susceptibility. We thus focused on HAP risk factors in critically ill AIS patients aged over 50 years to mitigate these confounders. In this study, HAP risk was 0.81-fold lower in patients aged 50–80 years vs >80 years (95% CI: 0.59–1.12, P=0.202), with no statistically significant impact on AIS prognosis.
While AIS is known to increase HAP risk, HAP susceptibility factors/mechanisms in AIS remain inconclusive. Patient-specific factors such as elderly age,4 multiple comorbidities,20 and consciousness disorders21 are linked to nosocomial infections, and the most common is HAP. Our research identified renal inadequacy (95% CI 1.60–4.60, P<0.001) and respiratory failure (95% CI 1.27–3.51, P=0.004) both related to baseline status as HAP risk factors in AIS patients, which is consistent with prior studies.22,23 Kidney disease-associated associated immune weakness impairs infection resistance. Fluid overload and uremia cause respiratory complications (eg, pulmonary edema, pleural effusion) predisposing to pneumonia.24 Dialysis patients face higher risk via repeated hospital environment exposure and invasive procedures.25 Stroke-induced dysphagia, immobility, and breathing pattern disruption cause respiratory failure, which is further exacerbated by pneumonia.26 Additionally, HAP (95% CI: 1.269–2.35, P<0.001) and respiratory failure significantly worsened prognosis (95% CI: 3.044–5.91, P<0.001) in AIS patients. Early HAP detection/prevention, combined with aggressive stroke/complication management, reduces hospital stay duration and improves outcomes. Notably, hospital stays >2 weeks definitively increased HAP risk in critically ill AIS patients—extending the focus of intensive care beyond the 1-week window emphasized in prior studies.27 Consecutive febrile days >1 week also strongly predicted HAP and poor outcomes: fever in AIS patients is often misattributed to stroke/medications, delaying pneumonia diagnosis; prolonged fever indicates underlying infection, with rising risk of pulmonary spread.28 Early recognition and management of these factors is critical for HAP prevention and outcome improvement.
Recent studies29,30 explored the practicality and feasibility of laboratory tests, such as WBC, neutrophils, and complete blood count, in predicting pneumonia, but their association with HAP in critically ill elderly AIS patients is rarely reported. While PCT and neutrophil levels typically rise in infections, their inverse association with HAP in our cohort may reflect unique immune dysregulation in elderly stroke patients, comorbidity-related biomarker masking, or critical illness threshold effects—findings requiring validation in broader populations. Limitations in our study include our cohort’s unique pathophysiology and require validation in broader populations, incomplete nutritional status assessment (no BMI, serum albumin, dietary intake, or clinical signs) and oversimplified hemoglobin (HB) evaluation. While positive bacterial culture is the HAI diagnostic gold standard, the link between bacterial species count and HAP in critically ill elderly AIS patients is unclear. Our data showed HAP risk rose sharply when bacterial count >1, but plateaued when >3—highlighting the need for close monitoring of patients with 1–3 bacterial species to prevent HAP progression from HAI and prognosis impairment. These factors collectively suggest underlying inflammation and immune dysregulation contribute to HAP in this population. AIS treatment impacts HAP risk. For example, excessive dehydrating agents that used for cerebral edema reduction cause fluid depletion or respiratory mucosal dryness, impairing respiratory defense and increasing HAP risk.31 Appropriate antibiotic use treated existing infections and prevented HAP; steroids and immunopotentiators (eg, thymosin, immunoglobulin) also reduced HAP risk—though immunopotentiator efficacy varied by patient characteristics/product type. Prophylactic antibiotics, steroids, and immunopotentiators should be cautiously considered in high-risk critically ill elderly AIS patients, weighing benefits against risks. While mechanical ventilation (often required in severe cases) increases HAP risk via tubing colonization/natural defense bypass,32 it was not a HAP risk factor in our study—emphasizing strict infection control and ventilation management to minimize ventilator-associated pneumonia.33
In general, we developed a reliable model for HAP risk factor identification and prognosis prediction in critically ill elderly AIS patients by comparing 11 classical ML algorithms and selecting 9 representative features—improving model accuracy, reliability, and medical decision-making quality. We prioritized multi-metric evaluation including AUC, accuracy, precision, F1-score, recall, MCC, Brier score, kappa, as single metrics mislead HAP prediction. AUC measured overall discriminative ability to distinguish high- vs low-risk patients. Recall minimized false negatives to avoid delayed treatment. Brier score assessed probabilistic accuracy for clinical decisions. For validation, 10-fold cross-validation maximized limited sample size and assessed stability, with consistent fold AUC confirming less overfitting to single data subsets. Nonetheless, there were several limitations that should not be concealed. First, generalizability restricted to critically ill elderly AIS patients, not younger AIS patients or other stroke subtypes. Second, single-center design with no external validation, despite unified diagnosis and treatment protocols minimizing measurement bias, regional differences in epidemiology and medical resources may limit extrapolation, future multi-center large-cohort studies are planned. Third, there may be unmeasured factors or complex interactions that influence the development of HAP and prognosis (eg, stroke-induced immunosuppression mechanisms underlying HAP susceptibility remain unclear). Fourth, no assessment of hospital environmental factors or clinician practices such as ventilation quality, multidrug-resistant bacteria, ward crowding, and so no. Importantly, the follow-up period of the study was 6 months. Although it covered the observation window of the primary outcome, a complete data chain has not yet been formed for the long-term prognosis of chronic diseases (such as the incidence of pneumonia complications and long-term quality of life, etc). Short-term data may not fully reflect the sustained effects or delayed adverse reactions of intervention measures, especially when it comes to diseases with a longer course of illness. This study’s 6-month data collection period may limit generalizability due to potential seasonal variations in HAP incidence and inability to capture long-term trends, though aligned with similar ML studies.34,35 In addition, the findings of this study is consistent with existing studies.36 It is recommended that the application of the model be limited to acute phase management, such as decision-making within 3 months after admission. Lastly, potential overfitting, prediction consistency deviation, and limited calibration curve efficacy cannot be ignored. The AUC difference between the training set and the testing set in 10-fold cross-validation may be due to the temporal or spatial sampling bias of the testing set randomly and the generalization limitations of the model. And non-linear interactions within high-dimensional features may not have been fully captured by calibration curve (Supplementary Figure 1).
Conclusion
This study is the first machine learning-based study on HAP in elderly patients with AIS, and 11 machine learning algorithms are compared to construct a HAP prediction model. Results showed that the XGBoost algorithm performed the best, and 9 key clinical variables such as respiratory failure, length of hospitalization, number of consecutive fever days, and positive bacteria were identified, and the impact of these variables on patient survival was clarified. However, this study has some limitations. First, the single-center design lacks external verification, and the promotion of the results is limited. Second, factors such as the hospital environment and medical staff operations were not taken into account. Third, 6-month follow-up was relatively short, and long-term prognostic data was missing. Lastly, the model has calibration deviations, and some high-dimensional feature interactions have not been fully captured. In conclusion, this study provides a practical tool for the early identification, risk stratification, and precise intervention of HAP in elderly patients with severe AIS, and helps optimize clinical management strategies.
Abbreviations
HAP, Hospital-acquired pneumonia; AIS, Acute ischemic stroke; NIHSS, National Institute of Health stroke scale; ML, Machine learning; SHAP, SHapley Additive explanation; CNSR-III, Third China National Stroke Registry; HAI, Hospital-acquired infection; IQR, Interquartile range; Lasso binary logistic regression, The least absolute shrinkage and selection operator; SVM, Support vector machines; XGboost, Gradient Boosting; LightGBM, Light Gradient Boosting Machine; KNN, k-nearest neighbor; AdaBoost, Adaptive Boosting; BernoulliNB, Bernoulli Naive Bayes; GaussianNB, Gaussian Naive Bayes; ROC, receiver operating characteristic curves; AUC, Area under ROC; MCC, Matthews Correlation Coefficient; Friedman test, Friedman non-parametric repeated measures analysis of variance test; DCA, Decision curve analysis; WBC, White Blood cell; HB, Hemoglobin; PCT, Procalcitonin; CRP, C-reactive protein; NSAID, Nonsteroidal anti-inflammatory drug; ICU, Intensive care unit; CI, Confidence interval; SD, Standard deviation; ARD, Average Rank Difference; CD, critical difference.
Data Sharing Statement
Data, code, and scripts to reproduce the results or replicate the procedures of this study are available from the corresponding author upon request.
Ethics Approval and Consent to Participate
We carried out this study according to the revised Declaration of Helsinki, and the ethics committee of the First Affiliated Hospital of Xi’an Medical University approved the study with informed consent. Informed consent was obtained from all participants in this study. Clinical trial number: not applicable.
Acknowledgments
Qingxin Jiao, Xingyu Liu, Huimin Chen, and Ziqi Hu are co-first authors. We would like to acknowledge the reviewers for their helpful comments on this paper.
Funding
This work was supported by National Natural Science Foundation of China (81600694), Special support from the 16th National Postdoctoral Science Foundation of China (2023T160783), and the 73rd batch of postdoctoral Science Foundation in China (2023M734293).
Disclosure
The authors declare that they have no competing interests.
References
1. Xu J, Yalkun G, Wang M, et al. Impact of infection on the risk of recurrent stroke among patients with acute ischemic stroke. Stroke. 2020;51(8):2395–2403. doi:10.1161/STROKEAHA.120.029898
2. Emsley HC, Hopkins SJ. Acute ischaemic stroke and infection: recent and emerging concepts. Lancet Neurol. 2008;7(4):341–353.
3. Kumar S, Selim MH, Caplan LR. Medical complications after stroke. Lancet Neurol. 2010;9(1):105–118. doi:10.1016/S1474-4422(09)70266-2
4. Ahmed R, Mhina C, Philip K, et al. Age- and sex-specific trends in medical complications after acute ischemic stroke in the United States. Neurology. 2023;100(12):e1282–e1295. doi:10.1212/WNL.0000000000206749
5. Friedant AJ, Gouse BM, Boehme AK, et al. A simple prediction score for developing a hospital-acquired infection after acute ischemic stroke. J Stroke Cerebrovascular Dis. 2015;24(3):680–686. doi:10.1016/j.jstrokecerebrovasdis.2014.11.014
6. Namaganda P, Nakibuuka J, Kaddumukasa M, Katabira E. Stroke in young adults, stroke types and risk factors: a case control study. BMC Neurol. 2022;22(1):335. doi:10.1186/s12883-022-02853-5
7. CMABo N. Group of cerebrovascular diseases BoN, Chinese Medical Association: Chinese guidelines for diagnosis and treatment of acute ischemic stroke 2023. Chin J Neurol. 2024;57(06):523–559.
8. Infectious diseases group CSoRM: 2018 Chinese guidelines for the diagnosis and treatment of adults with hospital-acquired and ventilator associated pneumonia. Chin J Tuberculosis Respir. 2018;41(4):255–280.
9. Yang M, Chen T, Liu YX, Huang L. Visualizing set relationships: eVenn’s comprehensive approach to Venn diagrams. iMeta. 2024;3(3):e184. doi:10.1002/imt2.184
10. Lundberg S, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30.
11. Eto F, Nezu T, Nishi H, et al. Oral condition at admission predicts functional outcomes and hospital-acquired pneumonia development among acute ischemic stroke patients. Clin Oral Invest. 2024;28(8):434. doi:10.1007/s00784-024-05833-w
12. Yu Y, Zhu C, Liu C, Gao Y. Effect of prior atorvastatin treatment on the frequency of hospital acquired pneumonia and evolution of biomarkers in patients with acute ischemic stroke: a multicenter prospective study. Biomed Res. Int. 2017;2017:5642704. doi:10.1155/2017/5642704
13. Busl KM. Nosocomial infections in the neurointensive care unit. Neurol Clin. 2017;35(4):785–807. doi:10.1016/j.ncl.2017.06.012
14. Lu Y, Ma X, Tazmini K, Yang M, Zhou X, Wang Y. Admission serum calcium level and short-term mortality after acute ischemic stroke: a secondary analysis based on a norwegian retrospective cohort. Front Neurol. 2022;13:889518. doi:10.3389/fneur.2022.889518
15. Chowdhury MZI, Turin TC. Variable selection strategies and its importance in clinical prediction modelling. Fam Med Community Health. 2020;8(1):e000262. doi:10.1136/fmch-2019-000262
16. Sole ML, Talbert S, Yan X, et al. Nursing oral suction intervention to reduce aspiration and ventilator events (NO-ASPIRATE): a randomized clinical trial. J Adv Nurs. 2019;75(5):1108–1118. doi:10.1111/jan.13920
17. Westendorp WF, Dames C, Nederkoorn PJ, Meisel A. Immunodepression, infections, and functional outcome in ischemic stroke. Stroke. 2022;53(5):1438–1448. doi:10.1161/STROKEAHA.122.038867
18. Taylor-Clark TE, Undem BJ. Neural control of the lower airways: role in cough and airway inflammatory disease. Handbook Clin Neurol. 2022;188:373–391.
19. Gallizioli M, Arbaizar-Rovirosa M, Brea D, Planas AM. Differences in the post-stroke innate immune response between young and old. Semin Immunopathol. 2023;45(3):367–376. doi:10.1007/s00281-023-00990-8
20. Wei M, Huang Q, Yu F, et al. Stroke-associated infection in patients with co-morbid diabetes mellitus is associated with in-hospital mortality. Front Aging Neurosci. 2022;14:1024496. doi:10.3389/fnagi.2022.1024496
21. Mélotte E, Maudoux A, Panda R, et al. Links between swallowing and consciousness: a narrative review. Dysphagia. 2023;38(1):42–64. doi:10.1007/s00455-022-10452-2
22. Yuan M, Li Q, Zhang R, et al. Risk factors for and impact of poststroke pneumonia in patients with acute ischemic stroke. Medicine. 2021;100(12):e25213. doi:10.1097/MD.0000000000025213
23. Jiang X, Hu Y, Wang J, et al. Outcomes and risk factors for infection after endovascular treatment in patients with acute ischemic stroke. CNS Neurosci Ther. 2024;30(5):e14753. doi:10.1111/cns.14753
24. Dean NC, Griffith PP, Sorensen JS, McCauley L, Jones BE, Lee YC. Pleural effusions at first ED encounter predict worse clinical outcomes in patients with pneumonia. Chest. 2016;149(6):1509–1515. doi:10.1016/j.chest.2015.12.027
25. Roberts JA, Nicolau DP, Martin-Loeches I, et al. Imipenem/cilastatin/relebactam efficacy, safety and probability of target attainment in adults with hospital-acquired or ventilator-associated bacterial pneumonia among patients with baseline renal impairment, normal renal function, and augmented renal clearance. JAC-Antimicrob Resist. 2023;5(2):dlad011. doi:10.1093/jacamr/dlad011
26. Schrock JW, Lou L, Ball BAW, Van Etten J. The use of an emergency department dysphagia screen is associated with decreased pneumonia in acute strokes. Am J Emergency Med. 2018;36(12):2152–2154. doi:10.1016/j.ajem.2018.03.046
27. Fluck D, Fry CH, Robin J, et al. Impact of healthcare-associated infections within 7-days of acute stroke on health outcomes and risk of care-dependency: a multi-centre registry-based cohort study. Int Emerg Med. 2024;19(4):919–929. doi:10.1007/s11739-024-03543-5
28. Mezuki S, Matsuo R, Irie F, et al. Body temperature in the acute phase and clinical outcomes after acute ischemic stroke. PLoS One. 2024;19(1):e0296639. doi:10.1371/journal.pone.0296639
29. Ashour W, Al-Anwar AD, Kamel AE, Aidaros MA. Predictors of early infection in cerebral ischemic stroke. J Med Life. 2016;9(2):163–169.
30. Kaşikçi H, Üçgül Erçin E, Karaci R, Ülker M, Domaç Mayda F. Predictors of pneumonia in stroke patients with dysphagia: a Turkish study. Ideggyogyaszati szemle. 2024;77(9–10):341–348.
31. Yang JX, Han YJ, Yang MM, Gao CH, Cao J. Risk factors and predictors of acute gastrointestinal injury in stroke patients. Clin Neurol Neurosurg. 2023;225(107566):107566. doi:10.1016/j.clineuro.2022.107566
32. Ren Y, Liang J, Li X, et al. Association between oral microbial dysbiosis and poor functional outcomes in stroke-associated pneumonia patients. BMC Microbiol. 2023;23(1):305. doi:10.1186/s12866-023-03057-8
33. De montmollin E, Ruckly S, Schwebel C, et al. Pneumonia in acute ischemic stroke patients requiring invasive ventilation: impact on short and long-term outcomes. J Infect. 2019;79(3):220–227. doi:10.1016/j.jinf.2019.06.012
34. Xiang B, Liu Y, Jiao S, Zhang W, Wang S, Yi M. Development and validation of interpretable machine learning models for postoperative pneumonia prediction. Front Public Health. 2024;12(1468504). doi:10.3389/fpubh.2024.1468504
35. Inoue Y, Cooray U, Ishimaru M, et al. Oral self-care, pneumococcal vaccination, and pneumonia among japanese older people, assessed with machine learning. J Gerontol a Biol Sci Med Sci. 2023;78(11):2170–2175. doi:10.1093/gerona/glad161
36. Abujaber A, Fadlalla A, Gammoh D, Al-Thani H, El-Menyar A. Machine learning model to predict ventilator associated pneumonia in patients with traumatic brain injury: the c.5 decision tree approach. Brain Injury. 2021;35(9):1095–1102. doi:10.1080/02699052.2021.1959060
© 2025 The Author(s). This work is published and licensed by Dove Medical Press Limited. The
full terms of this license are available at https://www.dovepress.com/terms
and incorporate the Creative Commons Attribution
- Non Commercial (unported, 4.0) License.
By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted
without any further permission from Dove Medical Press Limited, provided the work is properly
attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.
Recommended articles
Identification of High-Risk Patients for Postoperative Myocardial Injury After CME Using Machine Learning: A 10-Year Multicenter Retrospective Study
Liu Y, Song C, Tian Z, Shen W
International Journal of General Medicine 2023, 16:1251-1264
Published Date: 7 April 2023
Helicobacter Pylori Infection as the Predominant High-Risk Factor for Gastric Cancer Recurrence Post-Gastrectomy: An 8-Year Multicenter Retrospective Study
Liu Y, Shang X, Du W, Shen W, Zhu Y
International Journal of General Medicine 2024, 17:4999-5014
Published Date: 29 October 2024
