Development and Validation of a Nomogram Model to Predict the Risk of Severe Pneumonia in Children with Pneumococcal Infection

Duoduo Li; Xixia Guo; Xiaolu Zhao; Li Wang; Xinyan Jia; Lingchao Wang; Weihong Lu; Xiangtao Wu; Fenglian Zhu

doi:10.2147/IDR.S594131

Back to Journals » Infection and Drug Resistance » Volume 19

Original Research

Development and Validation of a Nomogram Model to Predict the Risk of Severe Pneumonia in Children with Pneumococcal Infection

Authors Li D, Guo X, Zhao X, Wang L, Jia X, Wang L, Lu W, Wu X, Zhu F

Received 6 January 2026

Accepted for publication 2 April 2026

Published 16 April 2026 Volume 2026:19 594131

DOI https://doi.org/10.2147/IDR.S594131

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Hazrat Bilal

Download Article [PDF]

Duoduo Li,¹ Xixia Guo,¹ Xiaolu Zhao,² Li Wang,¹ Xinyan Jia,² Lingchao Wang,¹ Weihong Lu,¹ Xiangtao Wu,¹ Fenglian Zhu¹

¹Department of Pediatrics, the First Affiliated Hospital of Henan Medical University, Xinxiang, Henan Province, 453100, People’s Republic of China; ²Department of Nephrology, the First Affiliated Hospital of Henan Medical University, Xinxiang, Henan Province, 453100, People’s Republic of China

Correspondence: Duoduo Li, Department of Pediatrics, the First Afﬁliated Hospital of Henan Medical University, No. 88 of Jiankang Road, Weihui, Xinxiang, Henan Province, 453100, People’s Republic of China, Tel +86 0373-4403483, Fax +860373-4402573, Email [email protected] Fenglian Zhu, Department of Pediatrics, the First Afﬁliated Hospital of Henan Medical University, No. 88 of Jiankang Road, Weihui, Xinxiang, Henan Province, 453100, People’s Republic of China, Tel +86 0373-4403483, Fax +860373-4402573, Email [email protected]

Background: Streptococcus pneumoniae is a leading cause of bacterial pneumonia in children, with severe cases rapidly progressing to respiratory failure and multiple organ dysfunction. Accurate early risk stratification tools are urgently needed. This study aimed to develop and validate a nomogram model to predict the risk of PICU admission in children with pneumococcal pneumonia.
Methods: A retrospective cohort of 485 children diagnosed with pneumococcal pneumonia (August 2018–August 2023) was randomly divided into a training set (n=339) and a validation set (n=146) in a 7:3 ratio. Independent predictors of PICU admission were identified using univariate and multivariate logistic regression. A nomogram was constructed based on the training set and evaluated using ROC curves, calibration curves, Hosmer-Lemeshow tests, decision curve analysis (DCA), and SHAP analysis.
Results: Multivariate analysis identified nine independent predictors: cardiovascular abnormalities, electrolyte disturbances, elevated neutrophil percentage, prolonged wheezing duration, decreased albumin, decreased hemoglobin, and elevated CT score were risk factors, while prolonged fever and cough duration were protective factors. The nomogram achieved an AUC of 0.92 (95% CI: 0.89– 0.95) in the training set and 0.87 (95% CI: 0.81– 0.93) in the validation set. Calibration was satisfactory on the Hosmer-Lemeshow test, and DCA demonstrated net clinical benefit across a 5%– 95% threshold probability range. SHAP analysis identified cough duration, albumin, and cardiovascular abnormalities as the top contributing features.
Conclusion: This nine-variable nomogram demonstrates high accuracy, good calibration, and strong interpretability, providing clinicians with a practical tool for early identification of children at high risk for PICU admission, supporting risk stratification and treatment decisions.

Keywords: Streptococcus pneumoniae, children, severe pneumonia, nomogram, predictive model

Introduction

Community-acquired pneumonia (CAP) cause a heavy disease burden in children worldwide^1,2. According to the World Health Organization, pneumonia is the leading cause of death in children under 5 years of age, with Streptococcus pneumoniae infection accounting for a significant proportion^1. In low- and middle-income countries, the prevalence of Streptococcus pneumoniae is extremely high, with some healthy children having a prevalence as high as 65%³, especially among high-risk groups such as those with malnutrition, HIV infection, and underlying diseases^4,5. Although the widespread use of vaccines has reduced the incidence to some extent, severe Streptococcus pneumoniae pneumonia remains a serious challenge for pediatric critical care medicine due to serotype replacement, antibiotic resistance, and insufficient vaccine coverage in some areas. The condition of critically ill children often deteriorates rapidly and can quickly progress to respiratory failure, sepsis, or even multiple organ dysfunction syndrome (MODS), seriously threatening their lives⁶. Therefore, accurately identifying high-risk children with a tendency to develop severe illness in the early stages of the disease, and thus providing timely and intensive intervention and rational resource allocation, is key to improving prognosis.

Multivariate predictive models (such as XGBoost, Logistic regression, and machine learning) have been used to predict the risk of death, treatment failure, and adverse outcomes in children with severe pneumonia. Some models have an AUC of 0.87–0.94, demonstrating high discriminative power^7,8. Nomogram tools have been used to visualize the risks in areas such as severe community-acquired pneumonia (SCAP) and neonatal severe pneumonia. Commonly used variables include comorbidities, mechanical ventilation, low albumin, anemia, pleural effusion, and abnormal vital signs.^9,10 Some studies have attempted to combine laboratory indicators (such as CRP, lactate, white blood cells, uric acid, and hemoglobin) with clinical characteristics to improve the predictive ability of the models^10,11. However, these existing models share several critical limitations. First, widely used clinical scoring systems such as PSI and CURB-65 were originally developed and validated in adult populations and have not been validated in children, potentially leading to inaccurate risk stratification due to fundamental differences in immune maturity, physiological parameters, and disease presentation^12,13. Second, most models rely on single-domain indicators without integrating multidimensional diagnostic information; in particular, quantitative imaging features such as CT-based lung involvement scoring and bronchoscopic findings—which are highly informative in pneumococcal pneumonia—have not been incorporated into any existing pediatric prediction model. Third, existing models are largely derived from general pediatric pneumonia cohorts and have not been specifically developed or validated for pneumococcal pneumonia in children, a distinct clinical entity with a high propensity for lobar consolidation, mucus plug formation, and extrapulmonary complications requiring a tailored predictive approach. Fourth, these models generally have small sample sizes and lack independent validation sets, raising concerns about generalizability. Furthermore, many prior studies relied on clinical or radiological diagnosis alone without microbiological confirmation of the causative pathogen, which introduces diagnostic heterogeneity and limits the etiological specificity of the study cohort. In contrast, the present study required microbiological confirmation of pneumococcal infection via bronchoalveolar lavage fluid (BALF) culture or PCR as a mandatory inclusion criterion, ensuring a pathogen-specific cohort and enhancing the diagnostic specificity of the study population. Therefore, developing a risk prediction model for severe pneumococcal pneumonia in children based on clinically readily available indicators, multi-dimensional integration, strong interpretability, and rigorous validation is of great significance for assisting clinicians in risk stratification, optimizing monitoring strategies, rationally allocating medical resources, and guiding ICU admission decisions.

This study aims to develop a nomogram model to predict the risk of severe pneumococcal pneumonia in children through systematically collecting relevant clinical data from a large-sample, single-center retrospective cohort study, screening independent risk factors using multivariate logistic regression analysis, and conducting rigorous internal validation. We expect this model to provide clinicians with an accurate, intuitive, and easy-to-use risk assessment tool in the early stages of pediatric admission, thereby enabling early risk stratification, optimized treatment decisions, and ICU resource allocation, ultimately improving pediatric outcomes.

Method

Research Design and Research Subjects

This study was a single-center, retrospective observational cohort study aimed at developing and validating a nomogram model for predicting the risk of severe pneumococcal pneumonia in children. The study included children hospitalized in the Department of Pediatrics at the First Affiliated Hospital of Xinxiang Medical University and diagnosed with pneumococcal pneumonia between August 2018 and August 2023.

Inclusion Criteria

(1) Age ≤ 14 years; (2) Meets the diagnostic criteria for pneumonia^14; (3) Bronchoalveolar lavage fluid culture or PCR test confirmed as Streptococcus pneumoniae; (4) Complete key clinical data (including predictive and outcome variables).

Exclusion Criteria

(1) Individuals infected with other pathogens; (2) Individuals suffering from severe congenital diseases, immunodeficiency diseases, or dysfunction of vital organs such as the heart, liver, and kidneys; (3) The patient had received treatment with potent antibiotics or immunosuppressants prior to admission. These exclusion criteria were applied to remove children with pre-existing chronic conditions that could independently confound the assessment of pneumonia-related severity, such as congenital immunodeficiency, congenital heart disease, or chronic organ dysfunction. This is distinct from acute complications that developed during the hospitalization course, which were retained as predictor variables.

Children with pneumococcal pneumonia were ultimately included. A simple random sampling method (based on the R software’s `sample` function, with a seed number set to 2024) was used to divide patients into a training set (n=339, used for model development) and a validation set (n=146, used for model validation) in a 7:3 ratio according to their admission time. The sampling process was independent of admission date or any temporal ordering, ensuring that group assignment was purely random.

This study protocol has been approved by the Ethics Committee of the First Affiliated Hospital of Henan University of Medicine (Ethics No. EC-2025-712). Given the retrospective nature of the study, the Ethics Committee waived the requirement for informed consent from patients; however, all patient data was anonymized to protect privacy.

Data Collection and Definition

Clinical data of the children was collected by reviewing the electronic medical record system, including:

Demographic data: age, sex, weight.

Clinical symptoms and signs: fever (body temperature ≥38.5°C), cough, wheezing, cyanosis, three-recession sign (suprasternal notch, supraclavicular notch, intercostal retraction), rales, wheezing, etc., and their duration (from the onset of symptoms to admission, unit: days).

Laboratory test indicators: Fasting venous blood was collected from the child within 24 hours of admission to test the following indicators: white blood cell count (WBC), neutrophil percentage, lymphocyte percentage, C-reactive protein (CRP, immunoturbidimetric assay, instrument: Beckman Coulter AU5800), procalcitonin (PCT, chemiluminescence assay, instrument: Roche Cobas e601), lactate dehydrogenase (LDH, rate method), interleukin-6 (IL-6, enzyme-linked immunosorbent assay, kit: R&D Systems), albumin (bromocresol green method), alanine aminotransferase (ALT, rate method), creatinine (picric acid method), hemoglobin (methemoglobin cyanide method), and D-dimer (immunoturbidimetric assay). All laboratory tests were performed by the hospital’s laboratory department according to standard operating procedures, and quality control met the requirements of ISO 15189 laboratory accreditation.

Bronchoscopy:^14,15 Fiberoptic bronchoscopy was performed in children with moderate-to-severe airway obstruction, suspected mucus plug formation, or imaging findings suggestive of lobar/segmental atelectasis, in accordance with the “Guidelines for the Diagnosis and Treatment of Bronchoscopy in Children (2021 Edition).” Bronchoalveolar lavage fluid (BALF) was collected during the procedure for bacterial culture, Gram staining, and PCR testing to confirm Streptococcus pneumoniae infection. Mucus plugs were identified based on established bronchoscopic diagnostic criteria.¹⁵

CT scoring method¹⁶: Chest CT was semi-quantitatively assessed using the total CT score system for lobular grading. Each of the five lobes of both lungs was scored separately, and scores were assigned based on the proportion of consolidation or ground-glass opacity (GGO) in that lobe: 0 points (no involvement), 1 point (<5%), 2 points (5%–25%), 3 points (26%–49%), 4 points (50%–75%), and 5 points (>75%). If the lobe had a crazy-paving pattern, 1 point was added to the baseline score; if there was consolidation, 2 points were added. The scores of the five lobes were added together to obtain the total CT score (0–35 points). All images were scored independently and blinded by two radiologists with more than 5 years of pediatric experience who were unaware of the clinical grouping information. The consistency between scorers was assessed using the intraclass correlation coefficient. If there was any disagreement, a consensus was reached through consultation.

Complications: The following complications refer exclusively to conditions newly identified or acutely developed during the current hospitalization, representing the host’s acute response to pneumococcal infection. These are distinct from the pre-existing chronic conditions listed in the exclusion criteria and were treated as candidate predictor variables in the analysis. Record the presence of extrapulmonary complications (such as meningitis, pericarditis, sepsis, etc)., pleural effusion (confirmed by CT or ultrasound), cardiovascular abnormalities (including arrhythmia, heart failure, blood pressure abnormalities, etc., confirmed by electrocardiogram, echocardiography, or hemodynamic monitoring), electrolyte disturbances (hyponatremia, hypokalemia, hyperkalemia, metabolic acidosis, etc)., liver injury (ALT > 80 U/L), systemic inflammatory response syndrome (SIRS, meeting the diagnostic criteria of the 2005 International Pediatric Sepsis Consensus Conference), coagulation dysfunction (prothrombin time prolonged > 3 seconds or activated partial thromboplastin time prolonged > 10 seconds), urinary tract injury (creatinine > 1.5 times the upper limit of normal reference value), and underlying diseases (such as asthma, rickets, anemia, etc).

Outcome measures: In this study, whether the patient was transferred to or directly admitted to the pediatric intensive care unit (ICU) during hospitalization was used as the primary outcome measure for assessing the severity of the illness, ie., the definition of “severe illness”, and the patients were divided into ICU group and non-ICU group. PICU transfer or direct admission was determined according to our institutional protocol, based on the presence of any of the following criteria: (1) requirement for mechanical ventilation or non-invasive positive pressure ventilation; (2) hemodynamic instability requiring vasoactive drug support; (3) oxygen saturation < 92% despite high-flow oxygen supplementation; (4) altered consciousness or seizures; or (5) two or more concurrent organ dysfunctions. These criteria are consistent with national pediatric critical care guidelines in China.

Statistical Analysis

Data processing and analysis were performed using R version 4.3.3 (2024–02-29) software, combined with Zstats 1.0 (www.zstats.net) and R packages including pROC, rms, shapviz, and rmda, with each package applied to its respective analytical task as described below. The significance level was set at α=0.05 (two-sided). Missing data were handled by complete-case exclusion: patients with missing values in any key predictor or outcome variable were excluded prior to analysis, as specified in inclusion criterion (4). Among the 485 patients ultimately included in the study, no significant missing data were present in the final analytical dataset, and no imputation methods were applied.

Baseline data description and comparison: The Kolmogorov–Smirnov test was used to test normality. Normally distributed continuous variables were expressed as mean ± standard deviation, and t-tests were used for inter-group comparisons. Non-normally distributed continuous variables were expressed as median (M) and lower and upper quartiles (Q1, Q3), and Mann–Whitney U-tests were used for inter-group comparisons. Categorical variables were expressed as frequency (percentage), and chi-square tests or Fisher’s exact tests were used for inter-group comparisons. First, the baseline characteristics of the training and validation sets were compared to ensure comparability between the two sets of data.

Predictor selection and model construction: In the training set, all variables were first subjected to univariate logistic regression analysis. Variables with P < 0.05 were then included in multivariate logistic regression analysis to screen for predictors independently associated with severe pneumococcal pneumonia. Based on the results of the multivariate regression analysis, the regression coefficients (β values) of each factor were calculated, and a nomogram model for individualized prediction of the risk of severe pneumococcal pneumonia was constructed accordingly using the rms package. Prior to model construction, the assumption of linearity in the logit was assessed for all continuous predictors using restricted cubic splines (implemented via the rms package). For CT score and albumin, spline plots confirmed approximate linearity across the observed range of values. For neutrophil percentage, hemoglobin, fever duration, cough duration, and wheezing duration, no significant non-linear relationships were detected (all P-for-nonlinearity > 0.05). These results supported the validity of entering all continuous variables as linear terms in the logistic regression model.

Model Performance Evaluation

Discriminative Ability

The model’s discriminative ability was assessed using the receiver operating characteristic (ROC) curve, area under the curve (AUC), and C-index using the pROC package.

Calibration

The Hosmer-Lemeshow goodness-of-fit test and calibration curves were used to evaluate the consistency between the model’s predicted probabilities and the actual probabilities of occurrence using the rms package.

Clinical Efficacy

Decision curve analysis (DCA) was used to evaluate the clinical net benefit of the model at different threshold probabilities using the rmda package.

Model Interpretability

SHAP (Shapley Additive exPlanations) value analysis was performed using the shapviz package to explain the contribution and direction of each feature to the model’s prediction results.

Model validation

The final nonograph model was validated on an independent validation set to examine its generalization ability.

Result

Basic Characteristics and Differences Between Training and Validation Sets

This study included 485 participants, with 339 (69.90%) in the training set and 146 (30.10%) in the validation set. No statistically significant differences were observed between the two sets across all demographic, clinical, laboratory, imaging, and outcome variables (all P > 0.05), confirming the comparability of the two cohorts (Table 1).

Table 1 Basic Characteristics and Differences Between the Training and Validation Sets

Basic Clinical Characteristics and Differential Analysis of the Training Set

Within the training set, ICU patients were significantly younger (1.46 vs. 3.25 years, P<0.001), weighed less (11.00 vs. 15.00 kg, P=0.002), and had a shorter pre-hospital history (5.00 vs. 11.00 days, P<0.001) compared to non-ICU patients. Notably, the ICU group had shorter cough duration (3.00 vs. 8.00 days, P<0.001) but longer wheezing duration (P<0.001).

In terms of laboratory findings, ICU patients showed evidence of more severe inflammatory response and organ dysfunction, with significantly higher neutrophil percentage (61.15% vs. 50.20%), CRP (7.23 vs. 4.30 mg/L), LDH (363.50 vs. 302.00 U/L), IL-6 (14.53 vs. 8.98 pg/mL), ALT (33.50 vs. 26.00 U/L), and D-dimer (0.95 vs. 0.70 mg/L), and significantly lower albumin (35.80 vs. 42.00 g/L) and hemoglobin (103.00 vs. 117.00 g/L) (all P<0.05). CT scores were also significantly higher in the ICU group (13.00 vs. 9.00, P<0.001).

Regarding clinical signs and complications, the ICU group had significantly higher rates of wheezing (43.83% vs. 14.69%), cyanosis (21.60% vs. 3.95%), three-recession sign (42.59% vs. 20.34%), rales (58.02% vs. 39.55%), and mucus plug formation (28.40% vs. 9.60%). The most striking between-group differences were observed in cardiovascular abnormalities (45.06% vs. 6.21%) and extrapulmonary complications (67.90% vs. 23.16%), both P<0.001. Electrolyte disturbances, pleural effusion, liver injury, urinary tract injury, and underlying diseases were also significantly more prevalent in the ICU group (all P<0.05). Gender, fever, and systemic inflammatory response syndrome did not differ significantly between groups (all P>0.05). Detailed comparisons are presented in Table 2. In addition, the inter-rater reliability of CT scoring between the two radiologists was excellent, with an intraclass correlation coefficient (ICC) of 0.997 (95% CI: 0.997–0.998) (see Supplementary Table S1).

Table 2 Basic Characteristics and Differences of the Training Set

Univariate and Multivariate Logistic Regression Analysis of Each Variable in the Training Set

This study used univariate and multivariate logistic regression analyses to identify independent predictors significantly associated with ICU admission. In the univariate analysis, the following variables reached statistical significance (P < 0.05) and were therefore entered into the multivariate logistic regression model: cardiovascular abnormalities, electrolyte disturbances, wheezing duration, albumin, hemoglobin, neutrophil percentage, CT score, fever duration, cough duration, weight, cyanosis, three-recession sign, rales, mucus plug formation, pleural effusion, extrapulmonary complications, liver injury, urinary tract injury, underlying diseases, CRP, lymphocyte percentage, lactate dehydrogenase, and D-dimer (Table 3).

Table 3 Results of Univariate and Multivariate Logistic Regression

After mutual adjustment in the multivariate model, seven variables were identified as independent risk factors for ICU admission: cardiovascular abnormalities (OR=7.23, 95% CI: 2.98–17.57, P<0.001), electrolyte disturbances (OR=2.62, 95% CI: 1.03–6.69, P=0.043), prolonged wheezing duration (OR=1.65, 95% CI: 1.23–2.21, P<0.001), increased neutrophil percentage (OR=1.02, 95% CI: 1.01–1.04, P=0.014), increased CT score (OR=1.13, 95% CI: 1.05–1.21, P=0.001), decreased albumin (OR=0.87, 95% CI: 0.82–0.92, P<0.001), and decreased hemoglobin (OR=0.98, 95% CI: 0.96–0.99, P=0.049). In contrast, prolonged fever duration (OR=0.88, 95% CI: 0.79–0.98, P=0.025) and prolonged cough duration (OR=0.81, 95% CI: 0.74–0.87, P<0.001) were identified as protective factors, with OR values below 1 indicating an inverse association with ICU admission risk.

Variables that were significant in univariate analysis but did not retain independent predictive value in the multivariate model—including cyanosis, three-recession sign, rales, mucus plug formation, pleural effusion, extrapulmonary complications, liver injury, urinary tract injury, underlying diseases, CRP, lymphocyte percentage, lactate dehydrogenase, and D-dimer—were likely excluded due to multicollinearity with the retained predictors or loss of significance after mutual adjustment. Full regression results are presented in Table 3.

Establish a Nodal Chart Model

Based on multivariate logistic regression results, a nomogram model was established incorporating nine independent predictors: cardiovascular abnormalities, electrolyte disturbances, prolonged wheezing duration, decreased serum albumin, decreased hemoglobin, increased neutrophil percentage, increased CT score, fever duration, and cough duration. The likelihood ratio chi-square test, which compares the fit of the full model (containing all nine predictors) against a null model (containing no predictors), yielded χ² = 226.99 (P < 0.001), indicating that the inclusion of these predictors significantly improved model fit and that the overall model was statistically significant. The C-index was 0.917 (95% CI: 0.889–0.946). The Hosmer-Lemeshow goodness-of-fit test showed no significant deviation between predicted and observed probabilities (χ² = 9.98, df = 8, P = 0.266), confirming satisfactory model calibration. The AUC was 0.917 (95% CI: 0.889–0.945), demonstrating excellent discriminative ability. The nomogram is presented in Figure 1.

Figure 1 Nodal chart model.

Nonograph Model Validation and Clinical Utility Assessment

The predictive model demonstrated strong discriminative ability and clinical applicability. As shown in the receiver operating characteristic (ROC) analysis, the model achieved an area under the curve (AUC) of 0.92 (95% confidence interval: 0.89–0.95) on the training set, indicating excellent diagnostic performance. The Hosmer and Lemeshow goodness-of-fit test confirmed good model fit, with no significant deviation between predicted and observed results (χ² = 9.98, degrees of freedom = 8, p = 0.266). This strong discriminative ability was also effectively maintained in the independent test cohort, with an AUC of 0.87 (95% confidence interval: 0.81–0.93) (Hosmer and Lemeshow goodness of fit (GOF) test) (Figure 2).

Figure 2 Noctilinear plot model validation and clinical utility assessment (A) ROC curve of training set; (B) fitted curve of training set; (C) DCA curve of training set; (D) ROC curve of validation set; (E) fitted curve of validation set; (F) DCA curve of validation set).

(χ² = 14.545, degrees of freedom = 8, p-value = 0.06861) highlights the generality of the model.

The optimal probability cutoff values for the training and validation sets (Table 4) were determined to be 0.418 and 0.421, respectively, using the Youden index method. For practical clinical use, a predicted probability of approximately 0.42 may serve as a decision threshold. At this cutoff, the model achieved a sensitivity of 86% and a specificity of 78% in the validation set, meaning that 86% of children who ultimately required PICU admission were correctly identified, while 78% of children who did not require PICU admission were correctly classified as low-risk. This threshold represents a clinically pragmatic balance between sensitivity and specificity, minimizing the risk of failing to identify high-risk children while maintaining an acceptable false-positive rate. At the optimal cutoff in the validation set, the model showed an accuracy of 82% (95% confidence interval [CI]: 75–88), sensitivity of 86% (95% CI: 78–94), specificity of 78% (95% CI: 68–88), positive predictive value of 82% (95% CI: 74–90), and negative predictive value of 83% (95% CI: 73–92).Decision curve analysis (DCA) was used to assess clinical utility. The model’s net benefit outperformed both the “treat-all” and “treat-none” strategies across a broad range of clinically relevant threshold probabilities (approximately 5%–95%). This finding confirms the model’s potential to inform clinical decision-making by supporting risk stratification and early escalation of care.

Table 4 Confusion Matrix: Training Set + Validation Set

The model has high discrimination accuracy, satisfactory model calibration, and has proven clinical applicability within relevant risk thresholds, all of which support its potential application in clinical practice for risk stratification and individualized decision-making.

The Value of SHAP Analysis Nomogram Model

The SHAP analysis results systematically illustrate the contribution, direction of influence, and mode of action of each feature in the predictive model from three dimensions (Figure 3). Part A of Figure 3 shows, through feature importance ranking, that “cough duration” has the highest mean SHAP value, making it the most important predictor of the model, followed by “albumin” and “cardiovascular system abnormalities.” Part B of Figure 3 visually demonstrates the contribution direction of key features to the prediction results through a single-sample waterfall plot, where yellow bars indicate a contribution that increases the predicted probability of PICU admission (positive SHAP value), and orange bars indicate a contribution that decreases it (negative SHAP value). In this representative sample, CT score and cardiovascular abnormalities show orange bars (negative SHAP values), while cough duration and fever duration show yellow bars (positive SHAP values). Importantly, this reflects the specific feature values of this individual patient (cough duration = 2 days, fever duration = 1 day), where short symptom durations—consistent with the regression direction (OR < 1)—push the prediction toward higher risk. This is fully consistent with the multivariate regression results, in which longer fever and cough durations are associated with reduced ICU admission risk. The dependency scatter plot in Part C of Figure 3 further reveals the continuous relationship between feature values and SHAP values. It is evident that “albumin” level is negatively correlated with SHAP value (ie., the lower the value, the higher the predictive risk), while “CT score” shows a clear positive correlation with SHAP value (ie., the higher the score, the greater the predictive risk). These three parts collectively confirm the rationality of the model’s decision-making logic and its high consistency with clinical pathological mechanisms.

Figure 3 SHAP analysis of the nomogram model. (A) Variable importance plot. (B) Waterfall plot for a representative individual case; yellow bars indicate features increasing the predicted probability for that specific patient, whereas Orange bars indicate features decreasing the predicted probability. (C) SHAP summary plot.

Discussion

Retrospective cohort data from 485 children with pneumococcal pneumonia. The model demonstrated excellent discriminative ability on both the training set (AUC=0.92) and the validation set (AUC=0.87), with good calibration and significant clinical benefits. The results suggest that this model can be used for early identification of severe clinical cases and has high application value.

This study found no significant difference in baseline characteristics between the training and validation sets (P>0.05), ensuring the reliability of the model construction. Within the training set, ICU patients exhibited younger age, lower weight, shorter pre-hospital history but longer duration of wheezing, accompanied by elevated inflammatory markers (such as neutrophil percentage and CRP) and decreased organ function indicators (such as albumin and hemoglobin). These results are consistent with the pathophysiological mechanisms of critical pediatric diseases^17–19: younger and lower-weight children often have immature immune systems and are more prone to developing severe illness^17,18; while acute inflammatory response and organ damage are common drivers of ICU admission¹⁹. Multivariate logistic regression further identified independent predictors, including cardiovascular abnormalities (OR=7.23), electrolyte disturbances (OR=2.62), and increased CT scores (OR=1.13). Among them, cardiovascular abnormalities are the strongest predictor, which may be related to the common circulatory instability in children with severe illness^20–23, which is consistent with the previous studies emphasizing the importance of circulatory support in the management of severe illness.^24,25.

Electrolyte disturbances, such as hyponatremia, hypokalemia, acid-base imbalance, etc., significantly increase the risk of severe illness and require dynamic monitoring and timely correction²⁶. The higher the CT imaging score, the more severe the lung damage, which is an independent predictor of the progression of severe illness and poor prognosis¹⁶. CT quantitative scores are closely related to respiratory failure, ICU admission, mechanical ventilation requirements and death, and can reflect the degree of lung damage early and objectively, which is better than some traditional clinical scores (such as A-DROP, CURB-65, etc).²⁷.Regarding albumin (OR=0.87 per g/L decrease), each 1 g/L reduction in serum albumin is associated with a 13% increase in the odds of ICU admission; correspondingly, a clinically meaningful 5 g/L reduction corresponds to an estimated 47% increase in ICU admission odds [(1−0.87⁵) × 100%], highlighting the importance of monitoring nutritional and inflammatory status. Low albumin reflects not only nutritional depletion but also the systemic inflammatory response and hepatic dysfunction that frequently accompany severe pneumococcal disease, making it a readily available yet highly informative bedside indicator. A notable finding of this study is that prolonged fever duration (OR=0.88) and prolonged cough duration (OR=0.81) were identified as independent protective factors against ICU admission, with both demonstrating odds ratios below 1. While this may appear counterintuitive, several mechanistic explanations may account for this observation. First, sustained fever prior to admission may reflect a more organized and effective innate immune response, suggesting that the host is mounting a functional inflammatory defense against the pathogen rather than rapidly progressing toward immune exhaustion or overwhelming sepsis. Second, prolonged cough duration may indicate that airway clearance mechanisms remain partially intact, reducing the likelihood of severe mucus retention, lobar consolidation, and associated respiratory failure. Third, and perhaps most importantly, children with longer symptom duration before hospital presentation may represent a subgroup with a less fulminant disease trajectory: rapid clinical deterioration—characteristic of severe pneumococcal disease—would more likely prompt earlier care-seeking behavior, resulting in shorter pre-admission symptom duration in the ICU group. This interpretation is consistent with the significantly shorter pre-hospital history observed in the ICU group compared to the non-ICU group (5.00 vs. 11.00 days, P<0.001). These interpretations are exploratory in nature and warrant confirmation through prospective studies. Nonetheless, this finding highlights the importance of integrating multiple clinical variables in risk assessment, rather than relying on symptom duration alone as an indicator of disease severity. The model integrates these variables and shows high discriminative power (training set AUC=0.917, validation set AUC=0.87). The model balances sensitivity (86%) and specificity (78%) at the optimal cutoff value (probability 0.42), which is better than traditional scoring systems (such as the Pediatric Mortality Risk Score)^28,29,providing a practical tool for early identification of high-risk patients. In clinical practice, a predicted probability ≥ 0.42 at the time of admission should serve as a trigger for intensified monitoring, early senior consultation, and proactive preparation for potential ICU transfer. Clinicians can directly apply this threshold as a decision-support criterion: children whose nomogram-derived probability reaches or exceeds 42% warrant escalation of care regardless of initial clinical appearance, while those below this threshold may be managed with standard protocols under close observation.

This study ensured the robustness of the model through a rigorous validation framework. In the training set, the Hosmer-Lemeshow test showed a good fit (P=0.266), while the AUC of the validation set remained at 0.87, indicating that the model has strong generalization ability. Compared with some existing predictive tools (such as SOFA score), this model focuses more on the integration of clinical symptoms and imaging features. This model incorporates clinical symptoms (such as cough duration and wheezing duration) and imaging features (CT score), which is more in line with the needs of pediatric bedside assessment^16,27. SHAP analysis further enhanced the transparency of the model: cough duration, albumin and CT score are the top three contributing factors. It should be noted that cough duration carries an OR < 1 in the regression model, indicating that longer cough duration is protective against ICU admission. This is fully consistent with the SHAP analysis: in the beeswarm plot (Figure 3C), high feature values of cough duration (orange/red points) correspond to negative SHAP values (left side), confirming an inverse association with predicted risk. The waterfall plot (Figure 3B) represents a single patient with short cough duration, whose low feature value contributes positively to predicted risk—again consistent with OR < 1. Decreased albumin and increased CT score are consistent with the direction of risk prediction, as both carry positive associations with ICU admission risk in the regression model³⁰ In this study, elevated neutrophil percentage and decreased albumin were both independent risk factors, consistent with the prognostic value of neutrophil percentage to albumin ratio (NPAR) for 28- day mortality in sepsis patients³⁰ This interpretable design helps clinicians understand the model’s decision-making logic and increases trust. The calibration curve and DCA curve of the validation set both showed that the model’s predictions were highly consistent with the observed results, supporting its applicability in different populations. In addition, the feature contribution ranking provided by SHAP analysis is consistent with the recent trend of emphasizing the importance of interpretable AI in medical decision-making, making up for the shortcomings of traditional “black box” models^31,32. This model achieves higher-precision risk stratification through multivariate integration based on clinically available indicators, and for the first time systematically incorporates SHAP analysis into a pediatric pneumococcal pneumonia prediction model, improving the clinical acceptability of the model.

Compared with previous similar studies, this study has the following advantages: First, the sample size is sufficient, and the training set and validation set are split in a 7:3 ratio, with a standardized validation process and higher reliability of results; second, it integrates multi-dimensional information such as clinical symptoms, laboratory indicators, and imaging characteristics, avoiding the limitations of single-indicator prediction; third, it clarifies the existence of protective factors and their potential mechanisms, enriching our understanding of the severe risk of pneumococcal pneumonia in children; and fourth, the indicators included in the model are all routine clinical tests, requiring no additional testing costs, making it suitable for promotion and application in primary hospitals.

Limitations and Future Directions

This study has several limitations. First, the sample was drawn from a single center, which may be influenced by regional epidemiological characteristics and homogenization of treatment protocols, leading to potential selection bias. Second, the retrospective design may introduce bias due to incomplete data recording, and future prospective studies are warranted for verification. Third, pneumococcal confirmation relied on BALF culture or PCR, and bronchoscopy with BALF sampling was performed based on clinical indications—including moderate-to-severe airway obstruction and imaging findings suggestive of atelectasis—which are more prevalent in severe cases. This indication-based sampling approach may have preferentially enriched the cohort with clinically severe patients, potentially inflating the observed associations between imaging variables (such as CT score), bronchoscopic findings (such as mucus plug formation), and ICU admission; the cohort may therefore not fully represent the complete spectrum of pediatric pneumococcal disease. Fourth, ICU admission as the primary outcome reflects not only biological disease severity but also institutional triage protocols, bed availability, and clinician judgment; the identified predictors should therefore be interpreted as factors associated with ICU admission risk in our institutional context rather than as universal biological markers of severity, and generalizability to settings with different admission thresholds requires caution. Fifth, the model did not incorporate dynamic monitoring indicators (such as 24-hour CRP trend and real-time vital signs) or variables such as bacterial serotype and antibiotic use, which may limit its timeliness and predictive accuracy. Sixth, the inclusion of CT score as a predictor assumes that chest CT is routinely performed early in the clinical course, which may not reflect standard practice in all pediatric settings—particularly resource-limited environments or primary care hospitals where CT is typically reserved for complicated or refractory cases. The model’s applicability may therefore be reduced in such contexts, and future studies should explore whether CT score can be replaced by clinical or radiographic surrogates to broaden the model’s generalizability. Finally, the outcome was defined solely as ICU admission and does not encompass broader clinical endpoints such as mortality, length of hospital stay, or mechanical ventilation requirements.

Future work should explore multi-center prospective validation, incorporate time-series and dynamic clinical features, expand outcome measures to include mortality and length of hospital stay, and conduct interventional trials to assess the model’s actual impact on clinical decision-making and patient outcomes.

Conclusion

This study constructed a nomogram predictive model for the risk of severe pneumococcal pneumonia in children based on real clinical data, integrating nine key predictive factors, including cough duration, albumin levels, CT score, cardiovascular abnormalities, electrolyte imbalance, neutrophil ratio, hemoglobin, wheezing duration, and fever duration. This model possesses advantages such as strong interpretability, high accuracy, and ease of operation, and is expected to be used for early clinical identification of high-risk children, assisting physicians in adjusting treatment strategies and ICU admission decisions, providing objective references for the optimal allocation of clinical resources, and playing a significant role in improving pediatric prognosis and reducing the medical burden. Future studies should consider expanding outcome measures beyond ICU admission to include mortality, length of hospital stay, and mechanical ventilation requirements, which would provide a more comprehensive assessment of disease severity. Incorporating dynamic clinical variables—such as serial trends in inflammatory markers and continuous vital sign monitoring—may further improve the model’s predictive timeliness and accuracy. Additionally, multi-center prospective validation across diverse clinical settings is warranted to establish the generalizability of this model and facilitate its broader clinical implementation.

Ethical Statement

This study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of the First Affiliated Hospital of Xinxiang Medical University (the First Affliated Hospital of Henan Medical University) (Approval No. EC-2025-712). Given the retrospective nature of the study and the use of anonymized data, the requirement for informed consent was waived by the Ethics Committee.

Author Contributions

All authors made a significant contribution to the work reported, whether in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising, or critically reviewing the article; gave final approval of the version to be published; agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This work was supported by the Joint Project of the Henan Medical Science and Technology Research Program (Grant No. LHGJ20230509) and Henan Province Science and Technology Research and Development Program Joint Fund(Grant No.252103810328).

Disclosure

The authors declare that they have no competing interests.

References

1. Bastug KA, Thielen BK, Moschovis PP, Sam-Agudu NA. Air pollution and global pediatric pneumococcal disease. Open Forum Infect Dis. 2025;12(10):ofaf234. doi:10.1093/ofid/ofaf234

2. Wang Y, Han R, Ding X, et al. A 32-year trend analysis of lower respiratory infections in children under 5: insights from the global burden of disease study 2021. Front Public Health. 2025;13:1483179. doi:10.3389/fpubh.2025.1483179

3. Tvedskov ESF, Hovmand N, Benfield T, Tinggaard M. Pneumococcal carriage among children in low and lower-middle-income countries: a systematic review. Int J Infect Dis. 2022;115:1–16. doi:10.1016/j.ijid.2021.11.021

4. Mekuria S, Tolossa D, Abebe T, Nour TY, Tesfaye A, Roble AK. Prevalence, antimicrobial drug resistance and associated risk factors of Streptococcus pneumoniae bacteria infection among under-five children with acute lower respiratory tract infection attending Sheik Hassan Yebere referral hospital, Jig-Jiga, Ethiopia. Infect Drug Resist. 2023;16:3511–3523. doi:10.2147/IDR.S409919

5. von Mollendorf C, Berger D, Gwee A, et al. ARI review group. Aetiology of childhood pneumonia in low- and middle-income countries in the era of vaccination: a systematic review. J Glob Health. 2022;12:10009. doi:10.7189/jogh.12.10009

6. Lu W, Guo X, Ren Y, et al. Time trends in the burden of non-COVID-19 lower respiratory tract infections among children aged 0 to 14 years. Front Cell Infect Microbiol. 2025;15:1582159. doi:10.3389/fcimb.2025.1582159

7. Cao S, Liu L, Yang L, et al. Assessing severe pneumonia risk in children via clinical prognostic model based on laboratory markers. Int Immunopharmacol. 2025;151:114317. doi:10.1016/j.intimp.2025.114317

8. Xu C, Tao X, Zhu J, et al. Clinical features and risk factors analysis for poor outcomes of severe community-acquired pneumonia in children: a nomogram prediction model. Front Pediatr. 2023;11:1194186. doi:10.3389/fped.2023.1194186

9. Agimas MC, Tesfie TK, Derseh NM, Kassaw A, Tu W-J. Derivation and validation of a model to predict treatment failure among under five children with severe community acquired pneumonia who are admitted at Debre Tabor specialized comprehensive hospital. PLoS One. 2025;20(3):e0320448. doi:10.1371/journal.pone.0320448

10. Gong W, Gao K, Ni J, et al. Construction and verification of a risk factor prediction model for neonatal severe pneumonia. Front Med Lausanne. 2025;12:1536705. doi:10.3389/fmed.2025.1536705

11. Ari M, Ari HF, Cengiz H. Advanced biomarkers for prognostic evaluation of pneumonia severity in pediatric intensive care: focus on novel inflammatory and hematological ratios. Ital J Pediatr. 2025;51(1):168. doi:10.1186/s13052-025-01989-7

12. Fine MJ, Auble TE, Yealy DM, et al. A prediction rule to identify low-risk patients with community-acquired pneumonia. N Engl J Med. 1997;336(4):243–250. doi:10.1056/NEJM199701233360402

13. Lim WS. Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study. Thorax. 2003;58(5):377–382. doi:10.1136/thorax.58.5.377

14. Wu X, Lu W, Sang X, et al. Timing of bronchoscopy and application of scoring tools in children with severe pneumonia. Ital J Pediatr. 2023;49(1):44. doi:10.1186/s13052-023-01446-3

15. Guo -X-X, Xu Y-L, Ren Y-S, et al. Analysis of risk factors for plastic bronchitis induced by Streptococcus pneumoniae in children. BMC Infect Dis. 2025;25(1):1032. doi:10.1186/s12879-025-11391-7

16. Lu W, Wu X, Xu Y, et al. Predictive value of bronchoscopy combined with CT score for refractory Mycoplasma pneumoniae pneumonia in children. BMC Pulm Med. 2024;24(1):251. doi:10.1186/s12890-024-02996-w

17. Fisler G, Brewer MR, Yaipen O, Deutschman CS, Taylor MD. Age influences the circulating immune profile in pediatric sepsis. Front Immunol. 2025;16:1527142. doi:10.3389/fimmu.2025.1527142

18. Lee YH, Choe YJ, Yoon YS, et al. Predicting ICU admission risk in children with respiratory syncytial virus. Infect Dis Ther. 2025;14(6):1277–1286. doi:10.1007/s40121-025-01155-w

19. Weiss SL, Peters MJ, Alhazzani W, et al. Surviving sepsis campaign international guidelines for the management of septic shock and sepsis-associated organ dysfunction in children. Pediatr Crit Care Med. 2020;21(2):e52–e106. doi:10.1097/PCC.0000000000002198

20. Bhat TI, Bashir A, Jan M, Ali I. Outcomes of pediatrics cardiac patients in the PICU: an analysis of clinical and echocardiographic risk factors. Int J Contemp Pediatr. 2025;12(2):204–208. doi:10.18203/2349-3291.ijcp20250085

21. Di Nardo M, MacLaren G, Marano M, Cecchetti C, Bernaschi P, Amodeo A. ECLS in Pediatric Cardiac Patients. Front Pediatr. 2016;4:109. doi:10.3389/fped.2016.00109

22. Liu Z, Zhu L, Li X, Zhai Q. A novel nomogram incorporating time-to-event modeling for predicting postoperative delirium in cardiac surgery patients. Gen Hosp Psychiatry. 2025;96:253–263. doi:10.1016/j.genhosppsych.2025.08.005

23. Miron A, Lafreniere-Roula M, Steve Fan C-P, et al. A validated model for sudden cardiac death risk prediction in pediatric hypertrophic cardiomyopathy. Circulation. 2020;142(3):217–229. doi:10.1161/CIRCULATIONAHA.120.047235

24. Reveco S, Barbagelata S, Cruces P, et al. Functional echocardiography identifies association between early ventricular dysfunction and outcome in pediatric sepsis. Front Pediatr. 2025;13:1570519. doi:10.3389/fped.2025.1570519

25. Stein DF, Carter MJ, Booth J, et al. Predicting Cardiovascular deterioration in a paediatric intensive care unit (PicEWS): a machine learning modelling study of routinely collected health-care data. EClinicalMedicine. 2025;85:103255. doi:10.1016/j.eclinm.2025.103255

26. Yuan J, Li L, Li F, et al. Analysis of risk factors affecting prognosis of fulminant myocarditis in children: a ten-year single-center study. BMC Pediatr. 2025;25(1):209. doi:10.1186/s12887-025-05530-x

27. Wu X, Lu W, Wang T, et al. Optimization strategy for the early timing of bronchoalveolar lavage treatment for children with severe Mycoplasma pneumoniae pneumonia. BMC Infect Dis. 2023;23(1):661. doi:10.1186/s12879-023-08619-9

28. Kim SY, Kim S, Cho J, et al. A deep learning model for real-time mortality prediction in critically ill children. Crit Care. 2019;23(1):279. doi:10.1186/s13054-019-2561-z

29. Ding M, Yang C, Li Y. Development and validation of a risk prediction model for unplanned 7-day readmission to PICU. Sci Rep. 2025;15(1):21164. doi:10.1038/s41598-025-08169-x

30. Hu C, He Y, Li J, et al. Association between neutrophil percentage-to-albumin ratio and 28-day mortality in Chinese patients with sepsis. J Int Med Res. 2023;51(6):3000605231178512. doi:10.1177/03000605231178512

31. Sarıdaş A, Aydin ÖF. SHAP analysis and comparative performance of the HEART, HET, and SVEAT scores in 30-day MACE prediction. Am J Emerg Med. 2025;95:1–6. doi:10.1016/j.ajem.2025.05.007

32. Ye Y, Gao Z, Zhang Z, Chen J, Chu C, Zhou W. A machine learning model for predicting severe Mycoplasma pneumoniae pneumonia in school-aged children. BMC Infect Dis. 2025;25(1):570. doi:10.1186/s12879-025-10958-8

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.