Back to Journals » Journal of Hepatocellular Carcinoma » Volume 13

Development and Validation of Predictive Machine Learning Models for Postoperative Recurrence and Microvascular Invasion in Hepatocellular Carcinoma Using Nuclear Magnetic Resonance Metabolomics

Authors Tan H, Xu Y, Liu W, Wen Y, Zhang C, Wang C, Chi L, Liao H, Fu S, Cai L, Guo H, Pan M

Received 15 December 2025

Accepted for publication 6 March 2026

Published 13 April 2026 Volume 2026:13 589098

DOI https://doi.org/10.2147/JHC.S589098

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Ahmed Kaseb



Hongkun Tan,1,* Yuyan Xu,1,* Wenxuan Liu,1,* Yaohong Wen,1 Cheng Zhang,1 Chunming Wang,1 Luhao Chi,1 Hangyu Liao,1 Shunjun Fu,1 Lei Cai,1 Hongbo Guo,2 Mingxin Pan1

1Department of Hepatobiliary Surgery II, General Surgery Center, Zhujiang Hospital, Southern Medical University, Guangzhou, People’s Republic of China; 2Neurosurgery Center, Zhujiang Hospital, Southern Medical University, Guangzhou, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Hongbo Guo, Email [email protected] Mingxin Pan, Email [email protected]

Background: Hepatocellular carcinoma (HCC) has a poor prognosis, necessitating better diagnostic tools. Nuclear magnetic resonance (NMR)-based metabolomics has emerged as a powerful tool for cancer biomarker discovery, yet its application in HCC prognosis remains underexplored. This study aimed to identify plasma metabolic biomarkers for the diagnosis and prognosis of HCC, and to develop predictive models for postoperative recurrence and microvascular invasion (MVI) to enhance clinical management.
Methods: We performed untargeted NMR metabolomic profiling of plasma from 92 HCC patients and 92 matched healthy controls. Differential metabolites were identified, and their diagnostic performance was assessed using receiver operating characteristic curves. Predictive models for postoperative recurrence and MVI were developed and validated using multiple machine learning algorithms, such as random forest, support vector machine, and gradient boosting machine models.
Results: Significant metabolic differences were identified, with 67 metabolites and blood-lipid indicators showing marked alterations. Acetic acid, dimethylsulfone, glycerol, glycine, and low-density lipoprotein (LDL)-3 cholesterol exhibited the highest discriminatory power (area under the curve [AUC] ≥ 0.954). Regarding HCC recurrence prediction, the StepCox[forward] + random survival forest model achieved an AUC of 0.811 and was an independent prognostic indicator (multivariate Cox HR = 1.20, 95% CI: 1.11– 1.30, P < 0.001). Regarding MVI prediction, the support vector machine model demonstrated superior performance (AUC = 0.957). Calibration curve, decision curve, and SHapley Additive exPlanations (SHAP) analyses confirmed model robustness and clinical utility. Two online platforms were developed for clinical implementation.
Conclusion: This study developed and validated NMR-based prognostic and MVI prediction models for HCC, offering valuable tools for precision management. Their clinical value warrants further validation in larger prospective cohorts.

Keywords: hepatocellular carcinoma, metabolomics, metabolites, microvascular invasion, NMR

Introduction

Liver cancer remains a major global health burden, with the Global Cancer Observatory reporting 865,000 new cases and 757,948 deaths worldwide in 2022.1 Hepatocellular carcinoma (HCC) is the main form of liver cancer, accounting for approximately 75%–85% of all cases.1 While surgical resection, liver transplantation, and local ablation are recommended for early-stage HCC (Barcelona Clinic Liver Cancer [BCLC] stage 0–A), postoperative recurrence remains high (30–50% at 2 years), critically affecting patient prognosis.2,3 Thus, the development of accurate prognostic prediction tools is essential to identify high-risk patients and guide personalized treatment.

Metabolomics captures systemic biochemical alterations that provide insights into tumor biology.4 Tumor cells undergo profound reprogramming of energy production and biosynthetic pathways, yielding molecular signatures detectable in biological fluids.5,6 Previous studies have demonstrated metabolomic differences between patients with HCC and those with cirrhosis or unaffected individuals, with specific metabolites emerging as independent prognostic markers.7,8 These findings underscore the potential of metabolomic approaches for HCC diagnosis and prognosis.

Among metabolomics platforms, nuclear magnetic resonance (NMR) spectroscopy offers advantages of high reproducibility, inherent quantitative capability, and procedural standardization, making it particularly suitable for clinical translation.9–11 However, its application in HCC prognosis remains underexplored. Herein, we applied NMR-based metabolomics to compare the plasma metabolic profiles of patients with HCC and healthy controls to identify diagnostic and prognostic biomarkers and construct predictive models for HCC recurrence and microvascular invasion (MVI), thereby facilitating improved clinical management.

Methods

Data Collection and Sample Preparation

A total of 184 plasma samples were obtained from 92 healthy individuals and 92 HCC patients with pathologically confirmed HCC. Immediately after collection, these samples were centrifuged at 1500–2000 ×g for 10 minutes at 2–8°C. The plasma supernatant was then aspirated and aliquoted into cryovials for subsequent storage at −80°C. Clinical data for these HCC patients were retrospectively collected, encompassing age, gender, cirrhotic status, alpha-fetal protein (AFP) levels, protein induced by vitamin K absence or antagonist-II (PIVKA-II) levels, tumor size, tumor multiplicity, vascular invasion status, and tumor stage.

NMR Metabolomics Analysis

Plasma samples were analyzed using the Nightingale proton nuclear magnetic resonance (1H-NMR) platform (ProteinT, Tianjin, China). This platform is widely used to detect the metabolites associated with other diseases.12–14 After thawing, 340 μL of plasma was mixed with an equal volume of NMR lipid buffer (Bruker Plasma Buffer), and 600 μL of the mixture was transferred to a 5-mm NMR tube for automated analysis.

Metabolite Qualification and Quantification

Spectral data were processed using Bruker’s Amix software, with chemical shift correction performed using Speaq. Metabolites were identified and quantified using Bruker’s proprietary NMR database, covering lipid metabolites (eg., very low-density lipoprotein [VLDL] cholesterol) and low-molecular-weight metabolites (eg., glutamine and creatinine).

Comparative Metabolome Analysis Between Healthy Individuals and Patients with HCC

Principal component analysis (PCA) was used to visualize group separation. Subsequently, partial least squares-discriminant analysis (PLS-DA) was used to construct a predictive model linking metabolite expression to sample categories. Differential expression was defined as metabolites with variable importance in projection (VIP) values >1.0, fold-change >1.2 or <0.833, and P <0.05. Enrichment analysis for metabolites and lipids was performed using MetaboAnalyst 5.0 and the proprietary ProteinT database, respectively. The discriminatory power of the selected molecules was assessed using the area under the receiver operating characteristic (ROC) curve (AUC).

Development and Validation of an HCC Recurrence Prediction Model

Patients from the study cohort who underwent radical resection were randomly divided into training and test cohorts in a 6:4 ratio. Differences in clinical characteristics between cohorts were compared using t-tests or Mann–Whitney U-tests for continuous variables and chi-square or Fisher’s exact tests for categorical variables. A prognostic model was built using the Mime1 R package, which executes 117 ensemble machine learning algorithms.15 Significant metabolites were preselected via univariate Cox regression (P <0.05) in the training cohort, with parameters optimized through 10-fold cross-validation. The model performance was evaluated in the test cohort using the C-index. The risk score (RS) derived from the optimal model was assessed as an independent prognostic factor using univariate and multivariate Cox regression analyses adjusted for clinicopathological variables. Xtile software was applied to determine the optimal RS cutoff to stratify patients into high- and low-risk groups, and survival differences were analyzed using Kaplan–Meier curves and Log rank tests. Finally, a Web-based application was developed for clinical use.

Development and Validation of an HCC-MVI Prediction Model

An interpretable machine-learning model for predicting HCC-MVI was developed using ten algorithms: XGBoost (XGB), random forest (RF), support vector machine (SVM), naïve Bayes (NB), gradient boosting machine (GBM), generalized linear model (GLM), generalized linear model with elastic-net (GLMNET), linear discriminant analysis (LDA), k-nearest neighbors (KNN), and recursive partitioning tree (RPART), with SHapley Additive exPlanations (SHAP) for model interpretation. The cohort was randomly divided into training and test cohorts in a 6:4 ratio. Significant metabolite features were selected using the Boruta algorithm (P <0.01, maximum of 200 iterations) in the training cohort. The model was built using five-fold cross-validation and ten replication runs. Model performance was evaluated based on sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1 score. The optimal classification threshold was determined by maximizing the Youden’s index from the ROC curve. Calibration and decision curve analyses were used to assess prediction accuracy and clinical net benefit. The SHAP method was used to quantify the contribution of each variable to predictions. Finally, a user-friendly web-based application was developed for clinical translation.

All analyses were performed using R software (version 4.4.1), with statistical significance set at P < 0.05.

Results

Distinct Blood Metabolomic Profiles Differentiate Patients with HCC from Healthy Controls

The NMR-based metabolomic analysis of plasma samples included 92 patients with HCC and 92 matched healthy controls. The results are presented in Supplementary Tables 1 and 2. Age (P = 0.798) and gender (P = 0.701) did not differ significantly between the groups (Supplementary Figure 1). Both PCA and PLS-DA demonstrated a clear separation between the two groups (Figure 1A and B), supporting the potential of blood metabolomics for HCC detection.

Figure 1 Plasma metabolomic profiles based on NMR revealed distinct differences between patients with HCC and healthy controls. (A and B) PCA (A) and PLS-DA (B) score plots of plasma metabolomes. The blue and Orange dashed circles represent the 95% confidence intervals for the Health and HCC groups, respectively, indicating the core distribution of the samples within each cohort. (C) Heatmap of all identified metabolites and blood-lipid indicators (The metabolite data underwent range transformation). (D) Heatmap of significantly altered metabolites and blood-lipid indicators (VIP >1.0, fold change >1.2 or <0.833, p <0.05). (E and F) Enrichment analysis of differential metabolites (E) and blood-lipid indicators (F). *, p <0.05.

Abbreviations: NMR, Nuclear magnetic resonance; HCC, Hepatocellular carcinoma; PCA, Principal component analysis; PLS-DA, Partial least squares-discriminant analysis; VIP, Variable importance in the projection.

A total of 151 metabolites and blood-lipid indicators were identified (Figure 1C), with 20 metabolites and 47 blood-lipid indicators showing significant alterations in HCC (Figure 1D and Supplementary Table 3). Notably, 34 metabolites or blood-lipid indicators exhibited fold-changes >2 or <0.5. The levels of 14 metabolites or blood-lipid indicators, such as choline and 2-hydroxybutyric acid, were elevated in HCC, whereas the levels of 20 others, including Low Density Lipoprotein (LDL)-4 Phospholipids and sarcosine, were reduced (Supplementary Table 3).

Enrichment analysis revealed the significant involvement of 14 metabolic pathways, including glycine and serine metabolism and ammonia recycling (Figure 1E). Lipid enrichment analysis indicated that differentially expressed lipids were primarily associated with medium granular lipoproteins (Figure 1F).

Specific Plasma Metabolites or Blood-Lipid Indicators Demonstrate Potential as Diagnostic Biomarkers for HCC

We identified 67 metabolites or blood-lipid indicators with VIP >1 (Figure 2A). To select robust biomarkers, we divided the cohort randomly into training and test cohorts, retaining only features with an AUC >0.95 in both cohorts. This yielded four metabolites—acetic acid, dimethylsulfone, glycerol, and glycine—and one blood-lipid indicator, LDL-3 cholesterol (L3CH), as candidate biomarkers (Figure 2B–F). All five biomarkers achieved AUCs ≥0.954 in both cohorts, with dimethylsulfone showing perfect discrimination (AUC = 1.000) in both. Acetic acid and glycine levels exhibited marked increases in HCC patients, demonstrating high diagnostic accuracy with AUCs of 0.975 (training cohort) and 0.962 (test cohort) for acetic acid, and 0.988 (training cohort) and 1.000 (test cohort) for glycine, respectively.

Figure 2 Specific plasma metabolites and blood-lipid indicators may serve as potential clinically relevant biomarkers for the diagnosis of HCC patients. (A) Bar chart of VIP values for differential metabolites and blood-lipid indicators (VIP >1). (BK) Box plots and ROC curves for plasma levels of acetic acid (B and C), dimethylsulfone (D and E), glycerol (F and G), glycine (H and I), and L3CH (J and K) in the training and test cohorts. ****, p <0.0001.

Abbreviations: HCC, Hepatocellular carcinoma; VIP, Variable importance in the projection; ROC, Receiver Operating Characteristic; L3CH, LDL-3 Cholesterol.

Development and Validation of Machine Learning Models for Recurrence in HCC Patients

To construct a metabolomics-based model to predict recurrence after radical HCC resection, we selected 88 patients who underwent radical HCC resection. The median recurrence-free survival (RFS) was 457 days. The cohort was randomly divided into training and test cohorts with balanced clinical characteristics (Table 1).

Table 1 Baseline Characteristics of Patients in the Training Cohort and Test Cohort for Predicting HCC Recurrence

Univariate Cox regression analysis identified eleven metabolites or blood-lipid indicators prognostic features, including high-density lipoprotein (HDL)-4 apolipoprotein A-II (Apo-A2), HDL-4 cholesterol, HDL-4 Apo-A1, VLDL-1 triglycerides, HDL-4 phospholipids, HDL-4 free cholesterol, tyrosine, 2-oxoglutaric acid, lysine, LDL-2 triglycerides, and glycine (Supplementary Table 4). Based on these features, 117 machine-learning ensemble models were developed and validated. The StepCox[forward] + random survival forest (RSF) ensemble model achieved the highest C-index in both training and test cohorts, with values of 0.9 and 0.76, respectively (Figure 3A). Further analyses using the StepCox[forward] + RSF model demonstrated its predictive performance for HCC recurrence, with 1-year AUCs of 0.955 and 0.843 and 2-year AUCs of 0.959 and 0.698 in the training and test cohorts, respectively (Figure 3B and C). Univariate and multivariate Cox regression analyses confirmed the RS derived from this model to be an independent prognostic factor (hazard ratio [HR] from univariate analysis = 1.20, 95% confidence interval [CI]: 1.13–1.27, P <0.001; HR from multivariate analysis = 1.20, 95% CI: 1.11–1.30, P <0.001; Figure 3D and Supplementary Table 5). This indicated the model remained robust against the clinical characteristics of patients with HCC and effectively predicted HCC recurrence. The comparative analysis of 2-year recurrence prediction revealed that the StepCox[forward] + RSF model achieved a significantly higher AUC value of 0.811, substantially outperforming both the China Liver Cancer (CNLC) staging system (AUC: 0.607) and the BCLC staging system (AUC: 0.588) (Figure 3E).

Figure 3 Development and validation of a machine learning-based metabolomic model for predicting recurrence in HCC patients. (A) Comparison of the C-index between training and test cohorts across 117 ensemble machine learning algorithms. (B and C) ROC curves of the optimal model in the training (B) and test (C) cohorts. (D) Forest plot from multivariate Cox regression of the optimal model (variables with p <0.05 in univariate analysis were included). (E) ROC curves comparing the optimal model with CNLC and BCLC staging for predicting 2-year recurrence. (F) Survival analysis between high- and low-risk groups stratified by the optimal model.

Abbreviations: HCC, Hepatocellular carcinoma; RSF, Random survival forest; ROC, Receiver Operating Characteristic; AFP, Alpha-fetal protein; BCLC stage, Barcelona clinic liver cancer stage; CNLC stage, Chinese liver cancer stage; RFS, Recurrence-free survival.

Based on results using Xtile, the cutoff value was set to 14.90 to categorize the patients into high-risk (n = 17) and low-risk (n = 71) groups. Survival analysis revealed a significant difference in RFS between two groups (P <0.001; Figure 3F).

To facilitate the clinical application, the StepCox[forward] + RSF model was deployed as a web application (https://tanhongkun.shinyapps.io/NMR_MET_HCC_earlyrecurrence_predictor/) to automatically predict HCC recurrence risk based on the user-input values of the 11 selected features.

Development and Validation of Machine Learning Models for HCC-MVI Prediction

To develop a metabolomics-based predictive model for MVI in HCC, we constructed interpretable machine learning models using ten algorithms (XGB, RF, SVM, NB, GBM, GLM, GLMNET, LDA, KNN, and RPART), which were interpreted via SHAP. We randomly split the cohort into training (60%) and test (40%) cohorts, which showed no significant differences in baseline clinical characteristics (Supplementary Table 6).

The Boruta feature selection algorithm identified seven metabolites or blood-lipid indicators that were significantly associated with MVI: VLDL-2 free cholesterol (V2FC), HDL-2 free cholesterol (H2FC), N, N-dimethylglycine, HDL-1 free cholesterol (H1FC), lactic acid, creatine, and pyruvic acid. For MVI diagnosis, the ten final models showed AUCs ranging from 0.723 to 1.000 in the training cohort, and from 0.613 to 0.921 in the test cohort (Figure 4A and B). The multidimensional performance comparisons are presented in Supplementary Table 7 and Supplementary Figure 2. Compared with other models, the SVM model demonstrated the highest diagnostic performance in both the training and test cohorts, achieving AUC values of 0.981 and 0.921, respectively. In the training cohort, it attained an accuracy of 92.5%, sensitivity of 100%, and specificity of 86.7%; in the test cohort, these values were 84.8%, 85.7%, and 84.2%, respectively (Figure 4C and D). The SVM model was selected as the optimal predictor of MVI after comparing the diagnostic performance of 10 models (Supplementary Table 7 and Supplementary Figure 2). The confusion matrix for the SVM model correctly identified all 23 MVI samples and 26 of 30 non-MVI samples in the training cohort. In the test cohort, the model correctly classified 12 of 14 MVI samples and 16 of 19 non-MVI samples (Figure 4E and F). These results suggests that the model has a well-balanced classification capability. The calibration curves for the training and test cohorts yielded Brier scores of 0.080 (95% CI: 0.055–0.110) and 0.110 (95% CI: 0.066–0.169), respectively (Figure 4G and H). No significant calibration discrepancy was observed between the cohorts, indicating consistent calibration performance. As shown in Figure 4I, decision curve analysis revealed that the model provided a higher net benefit than both the treat-all and treat-none strategies across most threshold probabilities, supporting its potential clinical utility. The SVM model achieved an AUC of 0.957, outperforming other clinical predictors for predicting MVI across the entire cohort (Figure 4J).

Figure 4 Development and validation of a machine learning-based metabolomic model for predicting HCC with MVI. (A and B) Forest plots display AUC values of ten machine learning algorithms in the training (A) and test (B) cohorts. (C and D) Radar charts summarize performance metrics of the SVM model in the training (C) and test (D) cohorts. (E and F) Confusion matrices of the SVM model in the training (E) and test (F) cohorts. (G and H) Calibration curves of the SVM model in the training (G) and test (H) cohorts. (I) Decision curve analysis for the SVM model in the entire cohort. (J) ROC curves comparing the SVM model and clinical features in predicting HCC with MVI. (K) SHAP summary plot showing global feature importance ranked by mean |SHAP| value. (L) Beeswarm plot depicting the direction and distribution of feature impacts.

Abbreviations: HCC, Hepatocellular carcinoma; MVI, Microvascular invasion; RF, Random forest; SVM, Support vector machine; NB, Naïve Bayes; GBM, Gradient boosting machine; GLM, Generalized linear model; GLMNET, Generalized linear model with elastic-net; LDA, Linear discriminant analysis; KNN, k-nearest neighbors; RPART, Recursive partitioning tree; PPV, Positive predictive value; NPV, Negative predictive value; BCLC stage, Barcelona clinic liver cancer stage; CNLC stage, Chinese liver cancer stage; SHAP, SHapley Additive exPlanations; V2FC, VLDL-2 free cholesterol; H2FC, HDL-2 free cholesterol; H1FC, HDL-1 free cholesterol.

The SHAP method was used to interpret the SVM model by quantifying the contribution of each feature to the predicted output (Figure 4K and L). The SHAP dependency plot showed that lactic acid, creatine, and pyruvic acid were positively associated with SHAP values, indicating a positive contribution to MVI prediction. In contrast, V2FC, H1FC, H2FC, and N, N-dimethylglycine were negatively associated with SHAP values, suggesting a mitigating effect on the prediction output (Supplementary Figure 3).

To facilitate clinical translation, the model was deployed as a web-based application (https://tanhongkun.shinyapps.io/NMR_MET_HCC_MVI_predictor/) that automatically estimated the MVI risk based on user-input values of the seven selected metabolite features.

Discussion

The high recurrence rate of HCC critically limits long-term survival, highlighting the need for reliable prognostic tools. Currently, NMR-based metabolomics remains underexplored for predicting HCC prognosis, particularly for assessing MVI. In this preliminary study, we showed that NMR plasma metabolomics revealed significant alterations in patients with HCC compared with healthy controls. Using these spectral data, we developed and validated machine learning models that demonstrated promising performance for recurrence prediction and MVI evaluation in HCC. Our findings support the clinical potential of NMR-based metabolomics in HCC management.

In this preliminary study, five metabolites—acetic acid, dimethylsulfone, glycerol, glycine, and L3CH—demonstrated potential diagnostic value for HCC. Elevated blood acetic acid levels align with findings from a prospective nested case-control study, which reported higher levels of acetic acid in individuals with HCC, potentially linking short-chain fatty acids derived from the gut microbiome to HCC development.16 Dimethylsulfone is an important sulfur-containing metabolite in the body, widely present in various bodily fluids. Its primary sources are dietary intake and intestinal bacterial metabolism.17 The marked reduction in dimethylsulfone levels in HCC may reflect disruptions in dietary changes.18 Further research is needed to investigate the causes of dimethylsulfone disorders in HCC patients. Altered glycerol and glycine patterns further indicate metabolic reprogramming. Increased glycine may fuel tumor growth by supporting glutathione synthesis, and have been demonstrated to be associated with poor prognosis in rectal cancer.19,20 Whereas decreased glycerol may suggest its utilization for synthesis in various glycerol-related metabolites, consistent with the significant alterations observed in glycerol-related metabolites observed in HCC tissues.21,22 A serum metabolite-targeted NMR study also demonstrated that serum glycerol levels in HCC patients were significantly reduced compared to those in patients with liver cirrhosis.23 The absence of L3CH might be associated with increased demand for cholesterol in cancer cells for membrane biosynthesis and signaling pathways, corroborating the known association between cholesterol metabolism disorders and HCC.24 These key metabolic perturbations collectively point to abnormally active biosynthetic metabolic activities in the HCC process, meeting the demands of rapid proliferation in cancer cells. Importantly, the metabolite set we identified not only provides mechanistic insights but also exhibits strong diagnostic potential, supporting its translation into a clinical prognostic tool.

Beyond diagnosis, we developed a postoperative recurrence prediction model based on 11 metabolic features. The StepCox[forward] + RSF model achieved a C-index of 0.76 in the testing set and an AUC of 0.811 for predicting 2-year recurrence, outperforming traditional staging systems such as CNLC and BCLC. The derived RS was an independent prognostic factor, effectively stratifying the patients into distinct recurrence risk groups. Although previous mass spectrometry-based studies have identified prognostic metabolites, their predictive performance for recurrence is limited. Wang et al reported a metabolite-based score based on liquid chromatography–mass spectrometry predicting overall survival with AUCs of 0.654–0.871.25 However, they did not report its predictive capability for RFS. Fang et al developed a gas chromatography–mass spectrometry model for postoperative recurrence with AUCs of 0.624–0.660.26 In contrast to the tissue-based multi-omics models reported by Wu et al, which may reveal mechanistic insights such as spatial heterogeneity but rely on invasive biopsies, our approach uses preoperative serum samples.27 This offers practical advantages in terms of sample accessibility, noninvasive dynamic monitoring, and clearer clinical translation potential. To our knowledge, this is the first study to establish a robust recurrence prediction model based on preoperative plasma NMR metabolomics. Our findings underscore the utility of NMR technology, owing to its high reproducibility and absolute quantification capabilities, for building stable and clinically applicable detection tools.

MVI is a key driver of early postoperative recurrence of HCC.28,29 However, accurate preoperative assessment remains clinically challenging. Current methods for diagnosis based on postoperative pathology limit preoperative strategy optimization, and imaging-based prediction approaches have not been widely adopted.28,30 Thus, highly accurate, non-invasive tools are needed to preoperatively evaluate MVI for individualized surgery (establishing a wider surgical margin) and adjuvant treatment planning. The SVM model in the present study demonstrated a strong predictive performance for MVI, outperforming conventional blood biomarkers, imaging features, and previously reported NMR-based models, such as those by Lee et al.31 This improvement likely stems from the capacity of machine learning to integrate multiple metabolic markers into a composite predictor, thereby enhancing the diagnostic efficacy. Furthermore, NMR technology provides highly reproducible and absolute quantitative data, providing a robust metabolic foundation to effectively capture systemic disturbances associated with MVI.32 The synergy between NMR-based profiling and machine-learning algorithms may be key for achieving high prediction accuracy. From a biological standpoint, our model suggests that MVI induces systemic metabolic alterations that are detectable in the blood using NMR metabolomics, offering new insights into the aggressive HCC phenotype. Finally, to facilitate clinical translation, we developed a user-friendly web-based application that enables surgeons to non-invasively assess MVI risk preoperatively, supporting informed decision-making for surgical planning and adjuvant therapy, with the potential to improve patient outcomes.

Although our results are encouraging, this study has several limitations. Its retrospective design, single-center nature, and relatively small sample size necessitate validation in prospective multicenter large-scale cohorts. Since tumor markers were not tested in healthy controls, it is not possible to compare the diagnostic performance of the identified differential metabolites and tumor markers. Furthermore, the inclusion of only patients with HCC precludes the determination of whether the identified metabolic signatures are specific to HCC or shared with other hepatic pathologies, such as intrahepatic cholangiocarcinoma or metastatic lesions. Future studies should include control groups to establish diagnostic specificity. Additionally, most HCC patients included in this study were HBV-related cases. It is essential that future research rigorously validate these findings in cohorts with different etiologies of HCC (eg., HBV, HCV, and NAFLD) to ensure their generalizability.

Conclusion

In summary, the results of this study provided compelling evidence that NMR-based plasma metabolomics can capture profound HCC-specific metabolic reprogramming. We identified potential diagnostic biomarkers and pioneered the development of highly accurate models for predicting recurrence and MVI—critical yet challenging aspects of HCC management. While further validation is necessary, our work establishes a robust foundation for leveraging this rapid and reproducible technique to enhance risk stratification and personalized treatment strategies for patients with HCC.

Abbreviations

HCC, Hepatocellular carcinoma; NMR, Nuclear magnetic resonance; MVI, Microvascular invasion; VLDL, Very low-density lipoprotein; LDL, Low-density lipoprotein; HDL, High-density lipoprotein; HR, Hazard ratio; CI, confidence interval; SHAP, SHapley Additive exPlanations; AFP, Alpha-fetal protein; PIVKA-II, Protein induced by vitamin K absence or antagonist-II; VIP, Variable importance in the projection; PCA, Principal component analysis; PLS-DA, Partial least squares-discriminant analysis; ROC, Receiver operating characteristic; AUC,Area under the receiver operating characteristic curve; RSF, Random survival forest; RS, Risk score; XGB, XGBoost; RF, Random forest; SVM, Support vector machine; NB, Naïve bayes; GBM, Gradient boosting machine; GLM, Generalized linear model; GLMNET, Generalized linear model with elastic-net; LDA, Linear discriminant analysis; KNN, k-nearest neighbors; RPART, Recursive partitioning tree; PPV, Positive predictive value; NPV, Negative predictive value; BCLC stage, Barcelona clinic liver cancer stage; CNLC stage, Chinese liver cancer stage; RFS, Recurrence-free survival; Apo, Apolipoprotein; L3CH, LDL-3 cholesterol; V2FC, VLDL-2 free cholesterol; H2FC, HDL-2 free cholesterol; H1FC, HDL-1 free cholesterol.

Data Sharing Statement

The data used for analysis in this study are available from Mingxin Pan on reasonable request.

Ethics Statement

Blood samples were collected with informed consent, in accordance with established biobank protocols and ethical and legal standards. This study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Review Committee of Zhujiang Hospital of Southern Medical University (2023-KY-022-01).

Acknowledgments

We sincerely thank all participants involved in this study.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This study was supported by the Key-Area Research and Development Program of Guangdong Province (No.2023B1111020008), the National Natural Science Foundation of China (No.82373159), Guangzhou Key Research and Development Program (2024B03J1381) and the Science and Technology Projects, Guangzhou, China (2023B03J1247).

Disclosure

The authors have no relevant financial or non-financial interests to disclose for this work.

References

1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. Ca a Cancer J Clin. 2024;74(3):229–13. doi:10.3322/caac.21834

2. Reig M, Forner A, Rimola J, et al. BCLC strategy for prognosis prediction and treatment recommendation: the 2022 update. J Hepatol. 2022;76(3):681–693. doi:10.1016/j.jhep.2021.11.018

3. Chan AWH, Zhong J, Berhane S, et al. Development of pre and post-operative models to predict early recurrence of hepatocellular carcinoma after surgical resection. J Hepatol. 2018;69(6):1284–1293. doi:10.1016/j.jhep.2018.08.027

4. Schmidt DR, Patel R, Kirsch DG, Lewis CA, Vander Heiden MG, Locasale JW. Metabolomics in cancer research and emerging applications in clinical oncology. CA Cancer J Clin. 2021;71(4):333–358. doi:10.3322/caac.21670

5. Pavlova NN, Thompson CB. The emerging hallmarks of cancer metabolism. Cell Metab. 2016;23(1):27–47. doi:10.1016/j.cmet.2015.12.006

6. Buergel T, Steinfeldt J, Ruyoga G, et al. Metabolomic profiles predict individual multidisease outcomes. Nature Med. 2022;28(11):2309–2320. doi:10.1038/s41591-022-01980-3

7. Han J, Han M-L, Xing H, et al. Tissue and serum metabolomic phenotyping for diagnosis and prognosis of hepatocellular carcinoma. Int J Cancer. 2020;146(6):1741–1753. doi:10.1002/ijc.32599

8. Luo P, Yin P, Hua R, et al. A Large-scale, multicenter serum metabolite biomarker identification study for the early detection of hepatocellular carcinoma. Hepatology. 2018;67(2):662–675. doi:10.1002/hep.29561

9. Beckonert O, Keun HC, Ebbels TMD, et al. Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts. Nat Protoc. 2007;2(11):2692–2703. doi:10.1038/nprot.2007.376

10. Nagana Gowda GA, Raftery D. Recent advances in NMR-based metabolomics. Anal Chem. 2017;89(1):490–510. doi:10.1021/acs.analchem.6b04420

11. Ghini V, Meoni G, Vignoli A, et al. Fingerprinting and profiling in metabolomics of biosamples. Prog Nucl Magn Reson Spectrosc. 2023;138-139:105–135. doi:10.1016/j.pnmrs.2023.10.002

12. Zonneveld MH, Al Kuhaili N, Mooijaart SP, et al. Increased 1H-NMR metabolomics-based health score associates with declined cognitive performance and functional Independence in older adults at risk of cardiovascular disease. GeroScience. 2024;47(2):2035–2045. doi:10.1007/s11357-024-01391-x

13. Vignoli A, Bellomo G, Paoletti FP, Luchinat C, Tenori L, Parnetti L. Studying Alzheimer’s disease through an integrative serum metabolomic and lipoproteomic approach. J Transl Med. 2025;23(1). doi:10.1186/s12967-025-06148-4

14. Soininen P, Kangas AJ, Würtz P, Suna T, Ala-Korpela M. Quantitative serum nuclear magnetic resonance metabolomics in cardiovascular epidemiology and genetics. Circul Cardiovascu Gene. 2015;8(1):192–206. doi:10.1161/circgenetics.114.000216

15. Liu H, Zhang W, Zhang Y, et al. Mime: a flexible machine-learning framework to construct and visualize models for clinical characteristics prediction and feature selection. Comput Struct Biotechnol J. 2024;23:2798–2810. doi:10.1016/j.csbj.2024.06.035

16. Karnam S, Jia W, Wang R, Adams-Haduch J, Koh W-P, Yuan J-M. Abstract 3601: elevated short-chain fatty acid level and increased risk of hepatocellular carcinoma in a prospective cohort study. Cancer Res. 2025;85(8_Supplement_1):3601. doi:10.1158/1538-7445.Am2025-3601

17. He X, Slupsky CM. Metabolic fingerprint of dimethyl sulfone (DMSO 2) in microbial–mammalian Co-metabolism. JProteome Res. 2014;13(12):5281–5292. doi:10.1021/pr500629t

18. George ES, Sood S, Broughton A, et al. The association between diet and hepatocellular carcinoma: a systematic review. Nutrients. 2021;13(1):172. doi:10.3390/nu13010172

19. Xia J, Zhang J, Meng B, et al. High glycine promotes proliferation and progression though increase of glutathione synthesis in multiple myeloma. Blood. 2019;134(Supplement_1):1791. doi:10.1182/blood-2019-125452

20. Redalen KR, Sitter B, Bathen TF, et al. High tumor glycine concentration is an adverse prognostic factor in locally advanced rectal cancer. Radiother Oncol. 2016;118(2):393–398. doi:10.1016/j.radonc.2015.11.031

21. Li Z, Guan M, Lin Y, et al. Aberrant lipid metabolism in hepatocellular carcinoma revealed by liver lipidomics. Int J Mol Sci. 2017;18(12):2550. doi:10.3390/ijms18122550

22. Ferrarini A, Di Poto C, He S, et al. Metabolomic analysis of liver tissues for characterization of hepatocellular carcinoma. JProteome Res. 2019;18(8):3067–3076. doi:10.1021/acs.jproteome.9b00185

23. Nardone L, Alunni-Fabbroni M, Schinner R, et al. Nuclear magnetic resonance-based lipid metabolite profiles for differentiation of patients with liver cirrhosis with and without hepatocellular carcinoma. J Cancer Res Clin Oncol. 2025;151(4). doi:10.1007/s00432-025-06178-x

24. Zhou F, Sun X. Cholesterol metabolism: a double-edged sword in hepatocellular carcinoma. Front Cell Develop Biol. 2021;9. doi:10.3389/fcell.2021.762828

25. Wang Q, Su B, Dong L, et al. Liquid chromatography–mass spectrometry-based nontargeted metabolomics predicts prognosis of hepatocellular carcinoma after curative resection. JProteome Res. 2020;19(8):3533–3541. doi:10.1021/acs.jproteome.0c00344

26. Fang C, Su B, Jiang T, et al. Prognosis prediction of hepatocellular carcinoma after surgical resection based on serum metabolic profiling from gas chromatography-mass spectrometry. Anal Bioanal Chem. 2021;413(12):3153–3165. doi:10.1007/s00216-021-03281-z

27. Wu W-J, Wang J, Chen F, et al. Exploration of heterogeneity and recurrence signatures in hepatocellular carcinoma. Mol Oncol. 2025;19(8):2388–2411. doi:10.1002/1878-0261.70012

28. Erstad DJ, Tanabe KK. Prognostic and therapeutic implications of microvascular invasion in hepatocellular carcinoma. Ann Surg Oncol. 2019;26(5):1474–1493. doi:10.1245/s10434-019-07227-9

29. Mazzaferro V, Llovet JM, Miceli R, et al. Predicting survival after liver transplantation in patients with hepatocellular carcinoma beyond the Milan criteria: a retrospective, exploratory analysis. Lancet Oncol. 2009;10(1):35–43. doi:10.1016/s1470-2045(08)70284-5

30. Zhang K, He K, Zhang L, et al. Gadoxetic acid–enhanced mri scoring model to predict pathologic features of hepatocellular carcinoma. Radiology. 2025;314(2). doi:10.1148/radiol.233229

31. Lee C-W, Yu M-C, Lin G, et al. Serum metabolites may be useful markers to assess vascular invasion and identify normal alpha-fetoprotein in hepatocellular carcinoma undergoing liver resection: a pilot study. World J Surg Oncol. 2020;18(1). doi:10.1186/s12957-020-01885-w

32. Huang K, Thomas N, Gooley PR, Armstrong CW. Systematic review of NMR-based metabolomics practices in human disease research. Metabolites. 2022;12(10):963. doi:10.3390/metabo12100963

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.