Machine Learning Model Predicts New-Onset Lower Extremity Deep Vein Thrombosis After Pelvic Fracture Surgery and Targeted Diagnosis

Haoyuan Fu; Qi Dong; Guoqiang Li; Kuo Zhao; Zhiyong Hou

doi:10.2147/CLEP.S590414

Back to Journals » Clinical Epidemiology » Volume 18

Original Research

Machine Learning Model Predicts New-Onset Lower Extremity Deep Vein Thrombosis After Pelvic Fracture Surgery and Targeted Diagnosis

Authors Fu H, Dong Q, Li G, Zhao K, Hou Z

Received 8 January 2026

Accepted for publication 24 April 2026

Published 7 May 2026 Volume 2026:18 590414

DOI https://doi.org/10.2147/CLEP.S590414

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Erzsébet Horváth-Puhó

Download Article [PDF]

Haoyuan Fu,^1,^2,^* Qi Dong,^1,^2,^* Guoqiang Li,^1,^2,^* Kuo Zhao,^1,² Zhiyong Hou^1,²

¹Department of Orthopedic Surgery, Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China; ²Orthopaedic Research Institute of Hebei Province, Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Zhiyong Hou, Department of Orthopedic Surgery, Third Hospital of Hebei Medical University, Shijiazhuang, 050051, People’s Republic of China, Email [email protected] Kuo Zhao, Department of Orthopedic Surgery, Third Hospital of Hebei Medical University, Shijiazhuang, 050051, People’s Republic of China, Email [email protected]

Background: Postoperative new-onset deep vein thrombosis (PNO-DVT) is a common and serious complication after pelvic fracture surgery, significantly affecting recovery and quality of life. Traditional risk assessment tools lack precision, whereas machine learning offers improved predictive capability. This study evaluated machine learning models for predicting PNO-DVT following pelvic fracture surgery.
Methods: Clinical data from 745 patients treated between 2016 and 2019 were analyzed. Demographic characteristics, laboratory indicators, surgical variables, and scoring systems were collected. Univariate logistic regression, LASSO regression, and multivariate logistic regression identified 12 independent risk factors. The dataset was split 7:3 into training and test sets. Six machine learning models (logistic regression, SVM, random forest, XGBoost, LightGBM, and AdaBoost) were constructed.
Results: Univariate analysis identified GGT, HDL-C, ApoB, TCO₂, GLU, GAP, EOS, and MPV as potential predictors. Multivariate analysis confirmed age, nephropathy, BMI, fracture reduction, intraoperative blood loss, red blood cell transfusion, HDL-C, ApoB, GAP, and MPV as independent risk factors. Among all models, XGBoost achieved the highest AUC (0.8633), demonstrating superior predictive performance. Model accuracy ranged from 0.6502 to 0.9283.
Conclusion: Machine learning (particularly XGBoost) provides effective prediction of PNO-DVT after pelvic fracture surgery. Key predictors such as age, intraoperative blood loss, and BMI enable accurate risk stratification and support early preventive intervention.

Keywords: DVT risk prediction, machine learning, model prediction, postoperative complications

Background

Pelvic fractures represent a severe form of trauma associated with a broad spectrum of complex and diverse postoperative complications. Notably, the development of deep vein thrombosis (DVT) in the lower extremities is a critical factor contributing to elevated mortality and disability rates among these patients. Over the years, numerous studies have demonstrated that individuals with pelvic fractures are predisposed to a hypercoagulable state, attributed to factors such as prolonged immobilization, tissue damage, and postoperative inflammatory responses, thereby elevating the risk of DVT.^1,2 The incidence of lower limb DVT in patients with pelvic fractures has been reported to reach 20% or higher, with most cases occurring postoperatively, thereby significantly impacting patient rehabilitation outcomes and quality of life.^3,4

Traditional risk assessment for DVT predominantly relies on clinical experience and isolated clinical indicators, such as D-dimer levels, patient age, and body mass index, in conjunction with straightforward assessment scales, including the Wells score, Autar score, and Padua score. However, these methodologies exhibit notable limitations in practical application, characterized by inadequate predictive accuracy and specificity, impairing the ability to achieve early and precise risk stratification in clinical settings. For instance, while the Autar and Padua scores demonstrate high sensitivity in intensive care unit (ICU) populations, their low specificity contributes to a significant rate of misdiagnosis, thereby complicating the implementation of tailored preventive strategies.^5,6 Furthermore, there is a dearth of specialized risk assessment instruments for patients with pelvic fractures, as traditional models are predominantly grounded in single disease contexts, failing to account for the intricate pathological conditions associated with pelvic fractures.²

With the rapid advancement of artificial intelligence and machine learning, risk prediction models leveraging big data have demonstrated significant potential in the medical domain. Machine learning is adept at processing extensive, multidimensional clinical datasets, enabling automatic learning and identification of complex, nonlinear relationships, thereby enhancing the accuracy and stability of risk assessments. Recent studies have developed machine learning models to predict postoperative DVT risks in patients undergoing brain trauma, tumor surgeries, and multiple injuries, achieving commendable predictive performance. For instance, models utilizing algorithms such as Gradient Boosting Machine (GBM) and LightGBM have reported area under the curve (AUC) values of 0.85 or higher in predicting DVT following brain trauma and brain tumor surgeries, markedly outperforming traditional statistical models.^7–9 Furthermore, integrating feature selection via Support Vector Machine (SVM) with reinforcement learning algorithms has achieved an accuracy of 95.9% for DVT risk prediction, underscoring the distinct advantages of machine learning in extracting critical risk factors and optimizing prediction models.¹⁰

Previous studies on pelvic fracture patients have sought to develop risk prediction models based on clinical variables. These include multifactorial regression models that integrate factors such as emergency abdominal surgery, the Injury Severity Score (ISS), and indicators of renal and liver function. The resulting risk assessment nomogram has demonstrated high predictive accuracy, with an AUC of approximately 0.88, in both training and validation cohorts, thereby serving as an effective tool for preoperative risk stratification of DVT.² Nonetheless, the comprehensive application of machine learning techniques to predict the risk of new-onset lower limb DVT following pelvic fracture surgery remains nascent. Accordingly, there is a pressing need to systematically incorporate large-scale, multicenter datasets and integrate multimodal clinical information to develop a more precise, clinically valuable prediction model with substantial clinical application potential.

Accordingly, a combination of traditional logistic regression and machine learning algorithms was adopted in the present study to develop a predictive model for new-onset lower limb DVT after pelvic fracture surgery, thereby enhancing clinical decision-making.

Methods

Data Collection and Processing

The study collected data from 857 patients diagnosed with pelvic fractures at the Third Hospital of Hebei Medical University from January 2016 to December 2019. DVT was confirmed by compression Doppler ultrasonography, primarily defined by venous non-compressibility alongside intraluminal echogenic material and absent flow variation. To minimize detection bias and accurately determine the temporal sequence of events, a standardized screening protocol was implemented for all eligible patients. This protocol included assessments at specific preoperative and postoperative intervals (days 3, 7, and 14). Consequently, blood indicators were tested both before and after the potential onset of thrombosis, allowing for a longitudinal analysis of biomarker changes and the inclusion of both asymptomatic and symptomatic cases to accurately reflect the true incidence within the cohort. Given that the samples were collected across multiple time intervals, we were able to distinguish between characteristics that existed before thrombosis and those that emerged after thrombosis. For characteristics tested after thrombosis, we considered that they might reflect the pathophysiological state of the disease rather than purely predictive functions. This analysis specifically evaluated whether the selected characteristics, especially those measured before thrombosis, possess the ability to serve as early predictive factors, rather than merely correlating with the established disease. After excluding 112 patients with preoperative thrombosis, 745 patients who developed postoperative DVT were included in the final analysis. The incidence of postoperative non-occlusive deep vein thrombosis (PNO-DVT) was 8.5% (n=64/681), with a predominance of male patients (74.8%, n=558/857).

Identification of Risk Factors for PNO-DVT

Independent risk factors were identified using a two-step analytical approach. Initially, a single-factor logistic regression analysis was employed to select factors with a P-value ≤ 0.05. Subsequently, the least absolute shrinkage and selection operator (LASSO) regression was utilized to further refine the selection of influencing factors for postoperative thrombosis, addressing potential collinearity issues. In the LASSO regression model, an increasing λ value intensified the penalty term, driving the coefficients of the independent variables towards zero. This regularization process effectively mitigated model overfitting. Based on the nine variables identified through univariate analysis and LASSO regression, a multivariate logistic regression was conducted to identify independent predictors of postoperative thrombosis.

Model Establishment

The dataset for model establishment was derived from selecting 12 significant features identified through single-factor analysis. This dataset was then randomly partitioned into a training set (70%) and a test set (30%). Given the approximately 1:10 ratio of positive to negative samples, a notable class imbalance was present. To mitigate potential model bias towards the negative samples, which could lead to a diminished recall rate, sampling techniques were applied exclusively to the training set, leaving the test set unaltered. To further ensure the robustness and stability of the performance evaluation, 10-fold cross-validation was performed on the training set. Six machine learning algorithms, including logistic regression, SVM, random forest, XGBoost, LightGBM, and AdaBoost, were employed to develop six distinct models. To optimize model performance, hyperparameter tuning was performed using grid search optimization on the training set to identify the most effective parameter configurations for each algorithm. All statistical analyses and data visualizations were executed using R version 4.5.1. The study utilized the “xgboost”, “e1071”, “randomForest”, “rpart”, and “caret” packages for machine learning modeling, and the “tidyverse”, “pROC”, “ggplot2”, “gridExtra”, “tibble”, and “shapviz” packages for data organization and visualization.

Results

Results of Univariate Logistic Regression Analysis

According to the univariate analysis of baseline characteristics in Table 1, age, hypertension, kidney disease, rheumatoid disease, peripheral vascular disease, and body mass index (BMI) were identified as potential risk factors for the development of postoperative thrombosis.

Table 1 Univariate Analysis of General Information

The univariate logistic regression analysis of surgical and blood transfusion data (Table 2) identified several potential risk factors for postoperative thrombosis. These factors included surgical approach (fracture reduction), type of bone graft, duration of the operation, intraoperative blood loss, intraoperative blood transfusion, method of blood transfusion, and the administration of suspended red blood cells, plasma, and cryoprecipitate coagulation factors.

Table 2 Univariate Analysis of Surgical and Blood Transfusion Information

The results of the univariate logistic regression analysis, presented in Table 3 of the preoperative laboratory data, indicated that Gamma-Glutamyl Transferase (GGT), High-Density Lipoprotein Cholesterol (HDL-C), Apolipoprotein B (ApoB), Total Carbon Dioxide (TCO2), Glucose (GLU), Glasgow Admission Prognostic Score (GAP), Eosinophil Count (EOS), and Mean Platelet Volume (MPV) represented potential risk factors for postoperative thrombosis.

Table 3 Univariate Analysis of Laboratory Information

Further analysis evaluated systemic inflammatory markers, including the Systemic Immune-Inflammation Index (SII), Neutrophil-to-Lymphocyte Ratio (NLR), and Platelet-to-Lymphocyte Ratio (PLR). However, none of these indicators demonstrated significant differences, as presented in Table 4.

Table 4 Univariate Analysis of Validation Index Scores

LASSO Regression for Feature Selection

LASSO regression was employed to further identify the factors influencing postoperative thrombosis. To mitigate collinearity in the data analysis, LASSO regression was used to select variables with P-values < 0.05 from the univariate analysis. As λ increased, the penalty also increased, leading to the gradual shrinkage of the regression coefficients of the independent variables towards zero. This regularization term effectively reduced model overfitting. Supplementary Figures 1 and 2 illustrate the variable selection process using LASSO regression and the impact of cross-validation on overfitting. In Supplementary Figure 1, each curve represents the trajectory of a variable’s coefficient, with the ordinate indicating the coefficient value and the abscissa showing the parameter λ, which controls the regularization intensity. As λ varied, the coefficients of the variables progressively approached zero, ultimately retaining only those variables with non-zero coefficients. Supplementary Figure 2 demonstrates the determination of the optimal λ value via 10-fold cross-validation, indicated by a vertical dotted line. The left dotted line illustrated that when Lambda was set to Lambda.min, 18 variables were selected, whereas the right dotted line demonstrated that when Lambda was set to Lambda.1se, 13 variables were selected. In this study, Lambda.1se (ie, Lambda = 0.01641399) was selected as the optimal penalty coefficient for the model. Following the LASSO regression analysis, variables with non-zero regression coefficients were identified, namely age, nephropathy, peripheral vascular disease, BMI, fracture reduction, operation time, intraoperative blood loss, de-erythrocytized RBC, HDL-C group, ApoB group, GLU group, GAP group, and MPV group, as presented in Supplementary File S2.

Multivariate Logistic Regression Analysis

Building on the nine variables identified through univariate analysis and LASSO regression, a multivariate logistic regression analysis was conducted to identify independent predictors of postoperative thrombosis. The assignment outcomes for the independent variables are presented in the table below. Our analysis identified age, nephropathy, BMI, fracture reduction, intraoperative blood loss, administration of red blood cell suspension, HDL-C, ApoB, GAP, and MPV as independent risk factors for blood transfusion. The detailed results of the regression analysis are presented in Table 5.

Table 5 Independent Predictors of Postoperative Thrombosis in Multivariate Logistic Regression Analysis

Establishment and Analysis of Postoperative PNO-DVT Prediction Model Based on Machine Learning Algorithms

Establishment of the Prediction Model and Performance Evaluation

The twelve significant features identified from the preceding univariate analysis were selected as the dataset for the final model construction. The dataset was randomly split into a training set (70%) and a test set (30%). Given that the ratio of positive to negative samples was approximately 1:10, there was a notable imbalance. To mitigate the risk of the model being biased towards the negative samples during prediction, which could result in an unacceptably low recall rate, sampling was conducted on the training set; however, no processing was applied to the test set. This approach ensured that the model evaluation more accurately reflected the model’s predictive capabilities. Six machine learning algorithms, including logistic regression, SVM, random forest, XGBoost, LightGBM, and AdaBoost, were employed to construct six distinct models. Among these, the XGBoost algorithm achieved the highest AUC value of 0.8633, indicating superior predictive performance for postoperative thrombosis within this dataset (Table 6). Consequently, XGBoost was determined to be the optimal model (Figure 1). Supplementary Figure 3 illustrates the feature importance rankings for the random forest, XGBoost, LightGBM, and AdaBoost models. To ensure the robustness and generalizability of model performance, we applied five‑fold cross‑validation within the training set, whereby the data were randomly partitioned into five equal subsets; in each iteration, four subsets were used for model training and the remaining subset for validation, and the final performance was obtained by averaging the results across all five folds (Supplementary File S1).

Table 6 Performance Evaluation of Each Model’s Prediction Model

Figure 1 Comparison of ROC curves for each model. The plot illustrates the diagnostic ability of six different classifiers by plotting the True Positive Rate (Sensitivity) against the False Positive Rate (1 - Specificity). The diagonal dashed line represents the performance of a random classifier. The legend details the Area Under the Curve (AUC) values and 95% confidence intervals for each model: Logistic Regression (AUC = 0.75), Random Forest (AUC = 0.805), XGBoost (AUC = 0.863), LightGBM (AUC = 0.835), Support Vector Machine (SVM) (AUC = 0.751), and AdaBoost (AUC = 0.818). XGBoost demonstrated the highest discriminative power among the evaluated models.

Explanatory Analysis Based on SHAP

Supplementary Figure 3 and Figure 2 provide visual representations of the feature importance rankings derived from SHAP values in the XGBoost model. Specifically, Supplementary Figure 3 presents a swarm plot, while Figure 2 displays a feature importance plot. Both figures effectively illustrate the relative significance of each feature within the model. Age was identified as the most important predictor, followed by intraoperative blood loss, red blood cell suspension, BMI, GLU group, GAP group, reduction method group, MPV group, HDL-C group, ApoB group, kidney disease, and peripheral vascular disease (Figure 3).

Figure 2 Summary plot of SHAP (SHapley Additive exPlanations) values, illustrating the impact of the top features on the model’s output.

Figure 3 Feature Importance Ranking Chart Based on SHAP Values.

Analysis Based on SHAP Waterfall Chart and Force-Directed Layout

Figure 4 demonstrates the utility of SHAP force plots for deconstructing individual predictions. These visualizations elucidate the contribution of each feature to a single sample’s outcome, where the color spectrum represents the direction of influence: features in yellow elevate the prediction score, while those in purple diminish it. The vertical axis enumerates the specific variables and their respective values for the given case. To illustrate this interpretability, force plots for representative positive and negative cases were generated. Analysis of a high-risk (positive) patient revealed that the model’s prediction was predominantly driven by several key factors. The patient’s age of 43 years was the most substantial contributor, increasing the prediction score by +0.269. This was closely followed by the administration of 2400 mL of packed red blood cells (+0.257) and an intraoperative blood loss of 700 mL (+0.16). Membership in GLU Group 1 provided a further, though smaller, positive influence (+0.103). Conversely, certain features moderated the overall risk. Membership in the GAP Group 0 exerted a slight protective effect (negative impact), partially offsetting the risk-enhancing factors. Other variables, such as HDL-C Group 0, MPV Group 0, reduction method 2, and ApoB Group 2, demonstrated minimal influence on the final prediction, despite their positive directional effect. The collective impact of the remaining features was negligible.

Figure 4 (a) SHAP force plot for a negative case. This plot shows how individual features collectively push the prediction toward a low DVT risk. Younger age (35 years), lower transfusion volume (400 mL), and moderate BMI exert the strongest negative contributions, while only GLU Group 1 provides a small positive effect. (b) SHAP force plot for a positive case. This plot illustrates the dominant factors driving a high predicted DVT risk. Older age (43 years), large-volume transfusion (2400 mL), and substantial intraoperative blood loss (700 mL) markedly increase the prediction score, whereas GAP Group 0 provides a minor protective influence.

The values associated with the features of negative sample patients are detailed as follows: Age 35 exerted the most significant negative impact on the prediction value, decreasing it by 0.766. The administration of 400 units of red blood cell suspension reduced the prediction value by 0.21, indicating a notable inhibitory effect. A BMI of 25.0 contributed a negative impact of 0.159, further decreasing the prediction value. The MPV group value of 1 reduced the prediction value by 0.0616, acting as a factor with a negative inhibitory effect. Conversely, the GLU group value of 1, highlighted in yellow, positively influenced the prediction value, albeit with a relatively small amplitude. Features such as the GAP group value of 0 and intraoperative blood loss of 600, marked in purple, negatively affected the prediction value, though their influence was relatively minor, as indicated by the figure Besides, features such as the HDL-C group value of 1 and the reduction method value of 2 yielded a discernible impact on the prediction value, with the degree of influence being relatively small, as illustrated in the figure The remaining 3 had a negligible impact on the prediction value.

Discussion

In recent years, the application of machine learning technology in the medical domain has notably expanded, particularly in thrombosis risk prediction, where it has demonstrated considerable benefits. Machine learning models can predict the likelihood of DVT after fractures or surgery, creating novel opportunities for tailored prevention. Empirical studies have shown that machine learning algorithms are proficient at handling substantial volumes of high-dimensional data, thereby facilitating the identification of novel risk factors and enhancing patient risk stratification.¹¹

Machine learning models have demonstrated significant efficacy in predicting DVT across diverse surgical contexts. In tibial fracture surgery, Random Forest and SVM models achieved exceptional accuracy (AUC 0.99) and calibration.¹² Similarly, ML approaches have proven superior to traditional risk scores like Caprini in laparoscopic surgery¹³ and have emerged as effective non-invasive tools for early DVT detection in gastric cancer patients.¹⁴ The use of machine learning to predict DVT extends beyond the postoperative context. In patients experiencing spontaneous intracerebral hemorrhage, machine learning algorithms have been employed to forecast the risk of lower extremity DVT during hospitalization. The Light Gradient Boosting Machine (LGBM) model was found to exhibit a notable advantage in DVT prediction.⁹ Furthermore, in oncology patients undergoing chemotherapy, machine learning models are utilized to evaluate the risk of catheter-related thrombosis. Despite the clinical application being constrained by complexity, the Bayesian learning model offers a simplified and robust alternative.¹⁵

As the utilization of machine learning algorithms in the medical field continues to expand, several researchers have endeavored to develop predictive models for DVT following fracture surgery using these advanced techniques. In the context of predicting DVT after various fractures or surgical procedures, numerous studies have demonstrated that machine learning models exhibit good performance in identifying significant risk factors and facilitating risk stratification. For instance, researchers have employed a range of algorithms for predicting DVT after knee and hip replacement surgeries, including XGBoost, RF, and SVM, integrating patient electronic health records (EHRs) to construct a highly accurate predictive model, achieving an AUC exceeding 0.92, which markedly surpasses the traditional Caprini score.¹⁶ Similarly, in the prediction of DVT following spinal surgery, the combination of Boruta and LASSO algorithms for feature selection enabled the random forest model to perform exceptionally well, accurately identifying patients at high risk.¹⁷ Furthermore, a study investigating DVT following surgical intervention for lower limb fractures compared various machine learning models, including XGBoost, Random Forest, and Logistic Regression. The results revealed that XGBoost exhibited superior performance with an AUC of 0.979, indicating robust predictive capability.¹⁸

Machine learning models offer substantial advantages over logistic regression in the context of large sample datasets, a perspective corroborated by numerous studies. Primarily, machine learning techniques excel at handling high-dimensional data and handling missing values. Contemporary machine learning approaches, including penalized regression, tree-based models, and deep learning, exhibit good performance in handling missing data within high-dimensional contexts, mitigating non-response bias, and demonstrating superiority in both simulation studies and practical applications.¹⁹ Furthermore, in the analysis of large-scale neuroimaging datasets, machine learning effectively mitigates overfitting issues by integrating the predictive strengths of multiple classifiers, thereby enhancing the model’s generalization capabilities.²⁰

Machine learning models consistently outperform traditional logistic regression in medical prediction, demonstrating superior discriminatory accuracy (higher AUC) for outcomes ranging from post-surgical mortality to disease severity.^21,22 Their capacity to efficiently process large datasets and capture complex, non-linear variable interactions enhances model robustness and predictive reliability.²³ Consequently, these algorithms improve clinical risk stratification by more accurately identifying high-risk patients and forecasting post-surgical complications.²⁴ In this study, we addressed a crucial clinical question: whether machine learning algorithms can improve the early prediction of postoperative non-occlusive deep vein thrombosis (PNO-DVT) in patients undergoing pelvic fracture surgery compared to traditional statistical methods. Although the traditional logistic regression model yielded a moderate AUC of 0.7503 (ranking fourth), the XGBoost model demonstrated superior discriminative performance. This finding aligns with a growing body of literature indicating that ensemble tree-based models consistently outperform logistic regression in complex clinical prediction tasks.^21,22 For instance, machine learning frameworks have achieved remarkable accuracy (with an AUC as high as 0.99) in tibial fracture cohorts¹² and have surpassed established clinical risk scores such as the Caprini model in laparoscopic surgical settings.¹³ Although our model did not achieve near-perfect metrics in certain highly homogeneous fracture populations, its robust accuracy in the context of clinical heterogeneity and high risk associated with pelvic fractures underscores its practical relevance.

The superiority of XGBoost may stem from its ability to model nonlinear relationships and higher-order interactions among multifactorial predictive factors, such as intraoperative blood loss, BMI, blood glucose, and MPV, which are inherently ignored by linear regression. Clinically, these results suggest that integrating advanced ML algorithms into postoperative workflows can refine risk stratification, providing precise thrombosis prevention for high-risk patients while minimizing overtreatment for low-risk patients. Ultimately, this supports the shift in orthopedic trauma care from standardized DVT prevention protocols to personalized, data-driven clinical decision-making.

Additionally, in the field of deep vein thrombosis (DVT) risk prediction, the XGBoost model has shown better prediction accuracy and discriminatory ability than traditional tools in most studies. Taking the classic Caprini scoring system, which is widely used for clinical preoperative VTE risk stratification but relies on manually weighted clinical risk factors and fails to capture complex variable interactions, as an example, the XGBoost model, by automatically learning nonlinear relationships from large-scale clinical data, has demonstrated superior performance in multiple independent studies: a study on orthopedic inpatients showed its AUC reached 0.931, significantly higher than that of the Caprini model with improved accuracy and specificity, while another study on postoperative patients with digestive system tumors found it outperformed traditional statistical methods in feature selection and predictive performance. Moreover, XGBoost has good interpretability support (such as SHAP value analysis) to identify key predictive factors like D-dimer, fibrinogen and age for clinical individualized risk assessment, whereas traditional tools often struggle to dynamically integrate laboratory indicators with real-time patient condition changes. However, XGBoost still needs multi-center external validation to ensure generalization ability, while tools like Caprini are still widely used at the grassroots level due to their simple operation and no need for data platforms.

In patients with pelvic fractures, the severity of vascular injury is significantly correlated with prognosis. Research indicates that, even in cases of relatively isolated pelvic injuries, the severity of vascular injury is more strongly associated with clinical outcomes than the anatomical complexity of the fracture itself.²⁵ Such vascular injuries not only elevate the risk of hemorrhage but may also initiate a series of inflammatory responses, thereby exacerbating the hypercoagulable state of the blood. In clinical practice, the management of trauma stress and inflammatory responses is crucial for improving the prognosis of patients with pelvic fractures. Research has shown that chronic psychological and social stress can affect the immune response through the β-adrenergic receptor signaling pathway, thereby influencing fracture healing.²⁵ Therefore, interventions targeting traumatic stress and inflammatory responses may alleviate hypercoagulability and improve overall patient prognosis. Accordingly, systemic inflammatory indicators SII, NLR, and PLR were incorporated into our analysis. Ultimately, none of these indicators were identified as significant predictors in this patient population.

Subsequently, SHAP values were employed to elucidate the probability of PNO-DVT as predicted by the XGBoost model. The analysis of SHAP values revealed that age was the most significant factor, followed by intraoperative blood loss, leukocyte removal, BMI, GLU group, GAP group, reduction method group, MPV group, HDL-C group, ApoB group, kidney disease, and peripheral vascular disease. Age and BMI, recognized as common cardiovascular risk factors, have been consistently associated with thrombosis in numerous studies. Current evidence indicates that older patients are at an increased risk for thrombotic events, potentially due to decreased vascular elasticity and the presence of atherosclerosis.²⁶ Furthermore, an elevated BMI has been linked to metabolic syndrome and inflammatory responses, both of which may contribute to the development of thrombosis.²⁷

Intraoperative blood loss and the volume of suspended red blood cells transfused postoperatively are critical determinants influencing postoperative thrombosis. Significant intraoperative blood loss can result in hemodynamic instability, whereas excessive postoperative transfusion of suspended red blood cells may elevate blood viscosity, thereby augmenting the risk of thrombosis.²⁸ These factors necessitate careful consideration in postoperative management to mitigate the risk of thrombosis.

Among laboratory test indicators, MPV is recognized as a marker of platelet activation and is closely associated with thrombosis. Empirical evidence suggests that patients with elevated MPV are more susceptible to venous thromboembolism (VTE), potentially due to the increased platelet volume, which enhances platelet reactivity and thus promotes thrombosis.^29,30 Furthermore, integrating MPV with other hematological indicators can enhance the accuracy of thrombosis risk prediction.³¹

Elevated glucose levels are significantly correlated with the incidence of DVT. Research indicates that hyperglycemia elevates the risk of thrombosis, potentially due to increased blood viscosity and vascular endothelial dysfunction associated with high glucose levels.³² Furthermore, individuals with diabetes are predisposed to thrombosis as a result of prolonged hyperglycemia, substantiating that glucose represents a risk factor for DVT.³³

Secondly, triglycerides and HDL-C are integral to lipid metabolism. Elevated triglyceride levels and reduced HDL-C levels have been linked to the incidence of DVT. Research indicates that dyslipidemia can damage the vascular endothelium and trigger inflammatory responses, thereby elevating the risk of thrombosis.^34,35 Furthermore, low-density lipoprotein cholesterol (LDL-C) and the lymphocyte-to-lymphocyte ratio (LLR) have been documented as predictors of DVT, underscoring the significant role of lipid metabolism disorders in the pathogenesis of DVT.³⁵

Apolipoprotein B (APOB), the primary component of low-density lipoprotein, is significantly associated with atherosclerosis and thrombosis. Research indicates that elevated levels of APOB may elevate the risk of deep vein thrombosis (DVT) by promoting lipid deposition and vascular endothelial damage.³⁶

In conclusion, our machine learning model identified age, intraoperative blood loss, intraoperative and postoperative blood transfusion, body mass index (BMI), glucose (GLU), and mean platelet volume (MPV) as the most critical predictors of postoperative non-occlusive deep vein thrombosis (PNO-DVT) in patients with pelvic fractures across the entire dataset. The collection of these biomarkers facilitates the prediction of postoperative PNO-DVT risk, thereby informing clinical practice. Furthermore, we employed SHapley Additive exPlanations (SHAP) to enhance the model’s interpretability. SHAP values assist clinicians in providing personalized interventions for patients at high risk of DVT, allowing for tailored intervention strategies based on individual patient risk factors rather than a standardized treatment approach for all patients. This personalized approach promotes the efficient allocation of medical resources.

This study has several limitations that need to be considered. Firstly, the inherent retrospective design of this work introduces the possibility of selection bias and limits our ability to establish causality. From a methodological perspective, the performance of the developed machine learning model was not compared with existing clinical risk assessment tools such as Caprini or Autar scores, making it difficult to determine whether the proposed model offers significant advantages over standard clinical practice. Meanwhile, the model validation only employed a simple training-testing split method, without implementing external validation. This single validation strategy may lead to biases in the evaluation of model performance and fail to fully reflect the model’s stability and generalization ability under different data distributions. The lack of an external validation cohort also limits the generalizability of the research results, and the applicability of the model in other clinical settings or populations requires further verification. In addition, the simple data splitting method may not effectively assess the risk of overfitting of the model, and its reliability in real-world clinical practice still needs to be confirmed through larger-scale, multicenter prospective studies. Secondly, although we identified important predictive factors, residual confounding remains a distinct possibility; unmeasured perioperative variables such as specific activity protocols, variations in surgeon experience, or precise adherence to anticoagulation regimens were not captured in our dataset, and these variables may have independently influenced clinical outcomes. Furthermore, the diagnosis of DVT relies on clinical suspicion at the time and available imaging resources, increasing the potential for information bias. Lastly, regarding clinical application and generalizability, the current study results are based on a specific single-center dataset, and the practical value and workflow integration of the model remain unclear until it is validated in a prospective clinical setting. Therefore, the predictive model needs to be externally validated in different multicenter cohorts before it can be generalized to a broader patient population.

Conclusion

In conclusion, this study demonstrates the utility of machine learning algorithms, particularly XGBoost, in predicting new-onset deep vein thrombosis (DVT) following pelvic fracture surgery. The analysis identified critical clinical predictors, including age, intraoperative blood loss, and body mass index, which consistently ranked high in importance across various models. The integration of these predictive tools into clinical practice offers a promising avenue for enhancing patient management. By enabling precise risk stratification, these models can assist clinicians in identifying high-risk patients early, facilitating timely preventative interventions, and ultimately improving post-operative outcomes.

Abbreviations

ASA, American Society of Anesthesiologists Physical Status Classification System; BMI, Body Mass Index; COPD, Chronic obstructive pulmonary disease; CV, Continuous variable; DVT, Deep venous thrombosis; Inf, Infinite (in statistics); ML, Machine learning; NLR, Neutrophil to Lymphocyte Ratio; PNO-DVT, Postoperative new-onset deep vein thrombosis; PLR, Platelet to Lymphocyte Ratio; Ref, Reference group; SII, Systemic Immune-Inflammation Index; VTE, Venous thromboembolism.

Data Sharing Statement

Detailed data and R code for this article can be obtained from Professor Kuo Zhao.

Ethics Approval and Consent to Participate

According to the Helsinki Declaration, this study has been approved by the Ethics Committee of the Third Hospital of Hebei Medical University. Ethical batch number: KEO-2025-397-1. Before analysis, all data were anonymized, and since it was a retrospective study design, obtaining patient consent was not necessary.

Consent for Publication

All the authors have agreed to publish.

Acknowledgments

This paper has been uploaded to medRxiv as a preprint: https://www.medrxiv.org/content/10.64898/2025.12.01.25341405v1.full

Funding

This study was supported by Key R&D Program of the China Ministry of Science and Technology (2024YFC2510600) and National Natural Science Foundation of China / 8220090241.

Disclosure

The authors declare that they have no competing interests in this work.

References

1. Chen H, Sun L, Kong X. Risk assessment scales to predict risk of lower extremity deep vein thrombosis among multiple trauma patients: a prospective cohort study. BMC Emerg Med. 2023;23(1):144. PMID: 38053029; PMCID: PMC10696745. doi:10.1186/s12873-023-00914-7

2. Chen Y, He J, Pan X. Prediction of risk factors for preoperative deep vein thrombosis in patients with pelvic fracture. Front Surg. 2025;12:1585460. PMID: 40356946; PMCID: PMC12066781. doi:10.3389/fsurg.2025.1585460

3. Wang H, Wu G, Chen CY, Qiu YY, Xie Y. Percutaneous screw fixation assisted by hollow pedicle finder for superior pubic ramus fractures. BMC Surg. 2022;22(1):216. PMID: 35658934; PMCID: PMC9166495. doi:10.1186/s12893-022-01659-z

4. Yang CS, Tan Z. Construction and validation of a predictive model for preoperative lower extremity deep vein thrombosis risk in elderly hip fracture patients: an observational study. Medicine. 2024;103(38):e39825. PMID: 39312315; PMCID: PMC11419451. doi:10.1097/MD.0000000000039825

5. Orak F, Saadat M, Saki Malehi A, Behdarvandan A, Esfandiarpour F. Comparison of the pauda and the autar DVT risk assessment scales in prediction of venous thromboembolism in ICU patients. Med J Islam Repub Iran. 2024;38:48. PMID: 39399622; PMCID: PMC11469705. doi:10.47176/mjiri.38.48

6. Shekarchian S, Notten P, Barbati ME, et al. Development of a prediction model for deep vein thrombosis in a retrospective cohort of patients with suspected deep vein thrombosis in primary care. J Vasc Surg Venous Lymphat Disord. 2022;10(5):1028–1036.e3. PMID: 35644336. doi:10.1016/j.jvsv.2022.04.009

7. Tang Z, Li N, Tian Y. A nomogram for predicting risk factors for lower limb deep venous thrombosis in elderly postoperative patients with severe traumatic brain injury in the intensive care unit. Phlebology. 2025;40(6):446–19. PMID: 40205921. doi:10.1177/02683555251332988

8. Wu L, Zhao Y, Yao G, Li X, Zhao X. Prediction and analysis of risk factors for lower extremity deep vein thrombosis after craniotomy in patients with primary brain tumors: a machine learning approach. Turk Neurosurg. 2025;35(4):636–643. PMID: 40577511. doi:10.5137/1019-5149.JTN.47938-24.3

9. Qiu W, Cui P, Li S, et al. Machine learning models predict risk of lower extremity deep vein thrombosis in hospitalized patients with spontaneous intracerebral hemorrhage. Sci Rep. 2025;15(1):24932. PMID: 40640445; PMCID: PMC12246041. doi:10.1038/s41598-025-10905-2

10. Li R, Chen S, Xia J, et al. Predictive modeling of deep vein thrombosis risk in hospitalized patients: a Q-learning enhanced feature selection model. Comput Biol Med. 2024;175:108447. PMID: 38691912. doi:10.1016/j.compbiomed.2024.108447

11. Chrysafi P, Lam B, Carton S, Patell R. From code to clots: applying machine learning to clinical aspects of venous thromboembolism prevention, diagnosis, and management. Hamostaseologie. 2024;44(6):429–445. PMID: 39657652. doi:10.1055/a-2415-8408

12. Baki H, Özçelik İB. Machine learning-based prediction of postoperative deep vein thrombosis following tibial fracture surgery. Diagnostics. 2025;15(14):1787. PMID: 40722536; PMCID: PMC12293441. doi:10.3390/diagnostics15141787

13. Yang SZ, Peng MH, Lin Q, Guan SW, Zhang KL, Yu HB. A machine learning-based predictive model for the occurrence of lower extremity deep vein thrombosis after laparoscopic surgery in abdominal surgery. Front Surg. 2025;12:1502944. PMID: 40520687; PMCID: PMC12162540. doi:10.3389/fsurg.2025.1502944

14. Zeng Y, Chen Y, Zhu D, et al. Machine learning assisted radiomics in predicting postoperative occurrence of deep venous thrombosis in patients with gastric cancer. BMC Cancer. 2025;25(1):220. PMID: 39920636; PMCID: PMC11806839. doi:10.1186/s12885-025-13630-1

15. An T, Han H, Xie J, et al. Enhancing prediction and stratifying risk: machine learning and Bayesian-learning models for catheter-related thrombosis in chemotherapy patients. BMC Cancer. 2025;25(1):552. PMID: 40148861; PMCID: PMC11948715. doi:10.1186/s12885-025-13946-y

16. Wang X, Xi H, Geng X, et al. Artificial intelligence-based prediction of lower extremity deep vein thrombosis risk after knee/hip arthroplasty. Clin Appl Thromb Hemost. 2023;29:10760296221139263. PMID: 36596268; PMCID: PMC9830569. doi:10.1177/10760296221139263

17. Wu X, Wang Z, Zheng L, et al. Construction and verification of a machine learning-based prediction model of deep vein thrombosis formation after spinal surgery. Int J Med Inform. 2024;192:105609. PMID: 39260049. doi:10.1016/j.ijmedinf.2024.105609

18. Wei C, Wang J, Yu P, et al. Comparison of different machine learning classification models for predicting deep vein thrombosis in lower extremity fractures. Sci Rep. 2024;14(1):6901. PMID: 38519523; PMCID: PMC10960026. doi:10.1038/s41598-024-57711-w

19. Chen S, Xu C. Handling high-dimensional data with missing values by modern machine learning techniques. J Appl Stat. 2022;50(3):786–804. PMID: 36819079; PMCID: PMC9930810. doi:10.1080/02664763.2022.2068514

20. Lanka P, Rangaprakash D, Dretsch MN, Katz JS, Denney TS, Deshpande G. Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets. Brain Imag Behav. 2020;14(6):2378–2416. PMID: 31691160; PMCID: PMC7198352. doi:10.1007/s11682-019-00191-8

21. Leonard G, South C, Balentine C, et al. Machine learning improves prediction over logistic regression on resected colon cancer patients. J Surg Res. 2022;275:181–193. PMID: 35287027. doi:10.1016/j.jss.2022.01.012

22. Ye J, Hua M, Zhu F. Machine learning algorithms are superior to conventional regression models in predicting risk stratification of covid-19 patients. Risk Manag Healthc Policy. 2021;14:3159–3166. PMID: 34349576; PMCID: PMC8328384. doi:10.2147/RMHP.S318265

23. Bailly A, Blanc C, Francis É, et al. Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Comput Methods Progr Biomed. 2022;213:106504. PMID: 34798408. doi:10.1016/j.cmpb.2021.106504

24. Michelsen C, Jørgensen CC, Heltberg M, et al. Machine-learning vs. logistic regression for preoperative prediction of medical morbidity after fast-track hip and knee arthroplasty-a comparative study. BMC Anesthesiol. 2023;23(1):391. PMID: 38030979; PMCID: PMC10685559. doi:10.1186/s12871-023-02354-z

25. Wu YT, Cheng CT, Tee YS, Fu CY, Liao CH, Hsieh CH. Pelvic injury prognosis is more closely related to vascular injury severity than anatomical fracture complexity: the WSES classification for pelvic trauma makes sense. World J Emerg Surg. 2020;15(1):48. PMID: 32807185; PMCID: PMC7433075. doi:10.1186/s13017-020-00328-x

26. Rong X, Jiang L, Qu M, Yang S, Wang K, Jiang L. Risk factors and characteristics of ischemic stroke in patients with immune thrombocytopenia: a retrospective cohort study. J Stroke Cerebrovasc Dis. 2022;31(10):106693. PMID: 36054971. doi:10.1016/j.jstrokecerebrovasdis.2022.106693

27. Hansen ES, Edvardsen MS, Aukrust P, et al. Combined effect of high factor VIII levels and high mean platelet volume on the risk of future incident venous thromboembolism. J Thromb Haemost. 2023;21(10):2844–2853. PMID: 37393000. doi:10.1016/j.jtha.2023.06.022

28. Zhang YM, Chen W, Wei HL, Tang XH, Xie FH, Wang RX. Analysis of predictive factors of thrombosis in autogenous arteriovenous fistula. J Vasc Access. 2024;25(4):1134–1139. PMID: 36707987. doi:10.1177/11297298221151135

29. Edvardsen MS, Hansen E-S, Hindberg K, et al. Combined effects of plasma von Willebrand factor and platelet measures on the risk of incident venous thromboembolism. Blood. 2021;138(22):2269–2277. PMID: 34161566. doi:10.1182/blood.2021011494

30. Wang Z, Chen X, Wu J, et al. Low mean platelet volume is associated with deep vein thrombosis in older patients with hip fracture. Clin Appl Thromb Hemost. 2022;28:10760296221078837. PMID: 35157546; PMCID: PMC8848069. doi:10.1177/10760296221078837

31. Jakobsen L, Frischmuth T, Brækkan SK, Hansen JB, Morelli VM. Joint effect of multiple prothrombotic genotypes and mean platelet volume on the risk of incident venous thromboembolism. Thromb Haemost. 2022;122(11):1911–1920. PMID: 35617954. doi:10.1055/a-1863-2052

32. Liu X, Li T, Xu H, et al. Hyperglycemia may increase deep vein thrombosis in trauma patients with lower limb fracture. Front Cardiovasc Med. 2022;9:944506. PMID: 36158801; PMCID: PMC9498976. doi:10.3389/fcvm.2022.944506

33. Hang L, Haibier A, Kayierhan A, Abudurexiti T. Risk factors for deep vein thrombosis of the lower extremity after total Hip arthroplasty. BMC Surg. 2024;24(1):256. PMID: 39261801; PMCID: PMC11389418. doi:10.1186/s12893-024-02561-6

34. Abdelmalik BHA, Leslom MMA, Gameraddin M, et al. Assessment of lower limb deep vein thrombosis: characterization and associated risk factors using triplex doppler imaging. Vasc Health Risk Manag. 2023;19:279–287. PMID: 37168880; PMCID: PMC10166097. doi:10.2147/VHRM.S409253

35. Guo H, Li C, Wu H, et al. Low-density lipoprotein cholesterol-to-lymphocyte count ratio (LLR) is a promising novel predictor of postoperative new-onset deep vein thrombosis following open wedge high tibial osteotomy: a propensity score-matched analysis. Thromb J. 2024;22(1):64. PMID: 39014396; PMCID: PMC11250942. doi:10.1186/s12959-024-00635-2

36. Gu H, Yang F, Xie H, Li M, Wang Z, Sheng L. Serum VEGF, P-selectin, HDL-C, platelet index, and coagulation function index in deep vein thrombosis after traumatic fracture. Clin Lab. 2024;70(2). PMID: 38345981. doi:10.7754/Clin.Lab.2023.230425

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]