Back to Journals » Journal of Hepatocellular Carcinoma » Volume 13

Integrated Machine Learning and Multi-Omics Identifies a Novel Molecular Signature for Improving the Prognosis of Hepatocellular Carcinoma

Authors Wu Z, Xiong J, Liu Q, Wang C, Li D, Wei L, Ding J

Received 16 October 2025

Accepted for publication 28 February 2026

Published 11 March 2026 Volume 2026:13 574690

DOI https://doi.org/10.2147/JHC.S574690

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Ahmed Kaseb



Zhongxu Wu,1,* Jianguang Xiong,2,* Qisheng Liu,2,* Chengdang Wang,1,3 Dan Li,4 Liuliu Wei,1 Jian Ding1,3

1Department of Gastroenterology, the First Affiliated Hospital of Fujian Medical University, Fuzhou, 350004, People’s Republic of China; 2Department of Gastroenterology, Xianning Central Hospital, Xianning, Hubei, 437000, People’s Republic of China; 3Department of Gastroenterology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou, 350200, People’s Republic of China; 4Department of Gastroenterology, Fujian Medical University Union Hospital, Fuzhou, 350001, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Jian Ding, Email [email protected]

Background: Hepatocellular carcinoma (HCC) exhibits significant molecular heterogeneity and complex immune microenvironment, which to some extent limits the accuracy of prognosis assessment and the formulation of individualized treatment strategies. This study aims to identify immune-derived molecular signatures based on multi-omics data and machine learning methods for the prognosis prediction and risk stratification of HCC.
Methods: Based on weighted gene co-expression network analysis(WGCNA) and differential gene analysis,immune-derived molecular signature (IDMS) were screened in both single-cell and bulk transcriptomes. Prognostic model was constructed by multi-machine learning approachs. Subsequently, we investigated the differences in mutations, biological functions, and immune cell infiltration within the tumor microenvironment between the high- and low-risk groups.In addition, we comprehensively analyzed the drug sensitivity of IDMS and predicted potential drugs.
Results: We identified seven hub genes at the single-cell and bulk transcriptome levels. Based on multiple machine learning, we constructed a prognostic model that demonstrated excellent performance in predicting overall survival for patients with HCC. IDMS -integrated normograms provide a promising and quantitative tool for clinical risk management.Notably, a significant difference in microsatellite instability (MSI) was observed between the high- and low-risk groups. This indicates that patients in the high-risk group might have a better response to immunotherapy. Additionally, we predicted potential drugs targeting to these risk subgroups.
Conclusion: Our research developed an IDMS that could serve as an effective tool for patient stratification management and prognosis prediction. This signature could provide a reference for immunotherapy for patients with HCC and improve their prognosis.

Keywords: hepatocellular carcinoma, single-cell RNA-seq, machine learning, multi-omics, biomarkers, immunotherapy response

Introduction

The incidence of hepatocellular carcinoma (HCC) is on the rise, making it the third most prevalent cause of cancer-related mortality globally.1,2 HCC typically presents with an insidious onset and rapid progression, contributing to a global five-year survival rate of less than 30%.3 Over the past decades, despite the establishment of a multidisciplinary treatment approach—including surgery, radiotherapy, radiofrequency ablation, molecular targeted therapy, and transcatheter arterial chemoembolization—morbidity and mortality have remained largely unchanged, and the overall prognosis remains poor.4 Given the significant heterogeneity of HCC, predictive, preventive, and personalized medicine (PPPM) represents a critical strategy for improving patient outcomes.5 Although several biomarkers related to HCC have been identified in recent years,6–8 their clinical application remains limited. First, bulk transcriptomic analyses often fail to capture the pronounced heterogeneity of the tumor microenvironment (TME). Second, the robustness and generalizability of biomarker signatures developed across independent research groups show considerable variability in validation cohorts,9,10 thereby diminishing their reliability for guiding individualized therapeutic decisions.

Currently, the intricate interactions between immune and cancer cells have emerged as a crucial focus of research efforts.11,12 The TME consists of stromal cells, fibroblasts, immune cells, endothelial cells, and cancer cells.13 The significant association of the immune cells and various tumor cells in the TME have been demonstrated.14–16 Cancer cells can maintain an immunosuppressive microenvironment and evade antitumor responses through multiple mechanisms, such as secreting immunosuppressive factors or recruiting regulatory T cells (Tregs).17,18 Meanwhile, chemotherapy, radiotherapy, and targeted therapies have been shown to enhance the immunogenicity of tumor cells and stimulate immune responses, thereby improving the efficacy of anticancer treatments.19–21 These findings suggest that immune-related genes (IRGs) serve as pivotal molecular mediators bridging tumor heterogeneity and personalized therapeutic strategies, underscoring their substantial translational and research significance.

With the advancement of sequencing technology, the integrated analysis of multi-omics has provided a new approach for exploring the biological characteristics of tumors. Herein, we employed a multi-omics approach combined with machine learning to investigate the potential clinical value of IRGs in HCC and identify an immune-derived molecular signature (IDMS). Our research indicates that the IDMS could be used as an innovative biomarker for predicting the prognosis and immunotherapy response in HCC, thereby contributing to improving patient outcomes through targeted prevention and personalized medical care.

Materials and Methods

Acquisition and Processing of Data

The single-cell dataset GSE16261622 was sourced from the Gene Expression Omnibus (GEO), converted into a Seurat object, and screened to retain high-quality cells (mitochondrial gene content < 20%, number of features > 300, log10GenesPerUMI > 0.80). The functions “Harmony” was employed to eliminate the impact of batch effects, followed by data normalization. The “FindMarkers” function was then employed to identify single-cell differentially expressed genes (scDEGs) with |logFC| > 0.25 and an adjusted p-value < 0.05. Bulk RNA-seq data were downloaded from The Cancer Genome Atlas (TCGA), and the associated clinical information was retrieved from the UCSC Xena platform. After excluding samples lacking clinical information, a final cohort of 424 patients from The Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) was obtained. To robustly identify key genes exhibiting consistent differential expression while minimizing the risk of overlooking biologically relevant molecules due to excessively stringent statistical thresholds, this study applied a DEG screening criterion of |log2FC| > 0.3 and an adjusted p-value < 0.05. Additionally, three independent datasets (GSE14520, GSE76427, GSE121248) were obtained from GEO,23–25 and a comprehensive dataset composed of 2,483 IRGs was retrieved from the ImmPort database. The full gene list is provided in Table 1.

Table 1 Immune-Related Genes (IRGs)

Weighted Gene Co-Expression Network Analysis (WGCNA)

The “WGCNA”26 was utilized to compute correlation coefficients between gene pairs and apply these weighted coefficients to construct a gene co-expression network consistent with a scale-free topology. Subsequently, the TCGA-LIHC dataset was analyzed to calculate gene variance, and the top 5,000 genes with the highest median absolute deviation were selected. We then assessed correlations between HCC and control groups across various modules. Genes within modules that exhibited an absolute correlation coefficient (|r|) greater than 0.50 were selected for further analysis.

Identification of Immune-Derived Molecule Signature and Prognosis Model

We obtained hub genes by intersecting of IRGs, scDEGs, DEGs and module genes from WGCNA. Hub genes were selected by five machine learning algorithms: the least absolute shrinkage and selection operator (LASSO), Boruta, random forest (RF), learning vector quantization (LVQ), and Bagged Trees. The overlapping genes detected by these five algorithms were utilized to determine the IDMS for HCC. Subsequently, using GSE14520 as the validation set, six machine learning algorithms—LogitBoost, support vector machine (SVM), Naive Bayes (NB), RF, k-nearest neighbors (KNN), and AdaBoost—were applied to construct HCC classification models using the TCGA-LIHC dataset. Five-fold cross-validation was performed on each model to optimize para-meters. To guarantee the reliability of the results, the optimization process was repeated ten times, each time using different random seeds for the resampling. The model that yielded the highest area under the curve (AUC) on the GSE14520 validation set was selected as the optimal prognosis model. In addition, for assessing the predictive performance of the prognosis model, we compared the predictive capabilities of IDMS with those of previously reported HCC markers across the TCGA-LIHC, GSE76427, GSE121248, and GSE14520.

Survival Analysis and Establishing a Nomogram

To evaluate the prognostic efficacy of the model, the TCGA-LIHC dataset was stratified into high- and low-risk groups based on the median value of predicted values. Kaplan-Meier survival curves were identify differences in OS between the two groups.Time-dependent receiver operating characteristic (ROC) curves were generated to calculate AUC using “survivalROC”. Furthermore, clinical characteristics and the predicted value of the diagnostic model were incorporated into multivariate Cox regression analysis. Based on these factors, a nomogram was developed to assess the risk of HCC, and the predictive performance of the nomogram was evaluated using calibration curves.

Gene Set Enrichment Analysis (GSEA) and Gene Set Variation Analysis (GSVA)

The “limma” and “clusterProfiler” packages were used to perform GSEA and GSVA on all genes in the TCGA-LIHC dataset. Functional enrichment differences between the high- and low-risk groups were then calculated. Genes with an adjusted p-value < 0.05 were considered statistically significant. All p-value adjustments were performed using the Benjamini-Hochberg method.

Analysis of Mutation and Immune Characteristics

SM data for HCC patients in the TCGA-LIHC dataset were processed using “VarScan”. CNV analysis was performed using GISTIC2.0.27 The Mann–Whitney U-test was used to evaluate differences in microsatellite instability (MSI), tumor mutation burden (TMB), and tumor immune dysfunction and exclusion (TIDE) scores among the subgroups.

Analysis of Immune Infiltration and Drug Sensitivity

The abundance of immune cell types was evaluated using CIBERSORT. The Drug–Gene Interaction Database (DGIdb) was used to explore interactions between IDMS and small-molecule compounds for identifying potential therapeutic candidates. Predicted binding sites from CB-Dock2 (https://cadd.labshare.cn/cb-dock2/php/index.php) were used to visualize molecular docking between candidate drugs and their target proteins.

Quantitative Real‑time Polymerase Chain Reaction (qRT‑PCR)

Total RNA was isolated using the RNAiso plus RNA extraction kit (Takara,9019, Japan), following the manufacturer’s guidelines.The PrimeScript RT reagent kit (Perfect Real Time, Takara, RR037A, Japan) was employed to conduct reverse transcription. qRT‑PCR was conducted using the LineGene 9600Plus fluorescence qPCR system (Bioer, Hangzhou, China). Cycle threshold (Ct) values for each gene were normalized using GAPDH as an internal reference, and relative gene expression was calculated using the comparative Ct method. Data were derived from three biological replicates and analyzed employing Student’s t-test. The primer sequences and detailed cell lines are available in Tables 2 and 3.

Table 2 The Source of Cell Lines

Table 3 Primer Sequences for IDMS

Statistical Analysis

All data analyses were carried out with R software version 4.2.1 and the corresponding R packages. Student’s t-test was employed to compare data that followed a normal distribution,while the Mann–Whitney U-test was used for data with non-normal distribution.

Results

Research Process and Immune-Derived Genes in Single-Cell Transcriptome

The overall analysis workflow is shown in Figure 1A. Following quality control of scRNA-seq dataset, a total of 61,763 cells were retained. After being annotated with the “SingleR” package, we identified six distinct cell types and 590 scDEGs (Figure 1B and C).

Figure 1 Flowchart and clustering analysis of scRNA-seq (A) Overall analytical workflow. (B) UMAP plot illustrating six identified cell types. (C) Heatmap of marker genes for each cell type.

Identification of the Hub Genes in Bulk RNA-Seq

Using the “limma” R package, we identified 10,533 DEGs in the TCGA-LIHC dataset, including 9,175 upregulated and 1,358 downregulated genes (Figure 2A). Co-expression modules were constructed using WGCNA, and the optimal soft-thresholding power was determined to be 8 (Figure 2B). Correlations between module eigengenes and the HCC and control groups were evaluated based on the expression profiles of the top 5,000 genes with the highest median absolute deviation (MAD) (Figure 2C–E). And three modules containing genes were filtrated, namely: MEblue, MEblack, and MEbrown. Subsequently, seven hub genes—NR4A2, TMSB10, S100A10, NR4A1,MAP3K8, HSPA1A, and HSP90AA1—were identified for further investigation (Figure 2F).

Figure 2 Identification of the hub genes (A) Volcano plot of differential gene expression analysis in the TCGA-LIHC. (B) Selection of the optimal soft-thresholding power and assessment of network connectivity. (C) Module clustering results for the top 5,000 genes based on MAD. (D) Correlation between module eigengenes and HCC/control groups. (E) Gene dendrogram and module clustering results. (F) Venn diagram of IRGs, scDEGs, DEGs, and module genes.

Screening Immune-Derived Molecular Signature by Multi-Machine Learning

Five machine learning algorithms were applied to select the biological significance biomarkers in hub genes (Table 4). Seven genes (HSP90AA1, HSPA1A, MAP3K8, NR4A1, NR4A2, S100A10, TMSB10) were identified using LASSO algorithm (Figure 3A). Both the Boruta and LVQ algorithms selected seven genes, respectively (Figure 3B and C). Bagged Trees identified four genes (S100A10, NR4A1, HSP90AA1, TMSB10) (Figure 3D), while the random forest algorithm identified six genes (HSP90AA1, S100A10, NR4A1, NR4A2, TMSB10, MAP3K8) (Figure 3E). Finally, we selected four overlapping genes as IDMS: HSP90AA1, NR4A1, S100A10, and TMSB10 (Figure 3F).

Figure 3 Screening immune-derived genes (A) Gene frequency distribution derived from the LASSO algorithm. (B) Genes selected using LVQ algorithm.The red dotted line indicating the threshold of importance for gene screening. (C) Gene selection using the Boruta algorithm. (D) Variable importance and model performance from the Bagged Trees. (E) Variable contribution and model accuracy from the random forest model. (F) The Upset diagram shows the intersection of the screening results of five machine learning algorithms and determines the final key genes.

Table 4 The Parameters of the Five Machine Learning Algorithms for Feature Selection

Identifying Prognosis Model by Integrated Machine Learning Analysis

Based on 4 characteristic genes identified above, we evaluated the predict potential of IDMS using 6 machine learning algorithms. The ROC curve analysis showed that SVM model could effectively distinguish HCC from the control groups, with an AUC value of 0.957 (Figure 4A and B). Furthermore, compared with four previously published HCC signatures developed by Liu et al, Shen et al, Qian et al, and Zhou et al,22,28–30 the IDMS exhibited consistently high predictive accuracy across multiple independent datasets, including TCGA-LIHC, GSE76427, and GSE14520. In the GSE121248 dataset, the predictive performance of IDMS was comparable to that of the biomarkers developed by Qian et al, with a higher meanAUC observed for IDMS (Figure 4C and D). Based on these findings, we selected the SVM as the optimal risk model for HCC. Subsequently, we performed annotation of IDMS and found that these genes were located on human chromosomes 1, 2, 12, and 14 (Figure 4E).

Figure 4 Identification of risk model and comparison with other HCC signatures (A) AUC values of risk models constructed using six machine learning algorithms. (B) ROC curve of the SVM model. (C) Circos plot comparing the performance of various published HCC signatures. (D) Heatmap showing predictive performance of the risk model across datasets. (E) Chromosomal mapping of IDMS.

Predictive Performance of SVM for HCC Prognosis

Using the median predicted value generated by the SVM model, a risk factor plot was created to visualize stratification between high- and low-risk groups (Figure 5A). The analysis showed that the expression levels of S100A10, TMSB10 and HSP90AA1 are significantly elevated in the high-risk group, and the mortality is also higher (Figure 5B). Kaplan-Meier survival analysis revealed that increased expression of these genes, along with the predicted value, was significantly associated with poor prognosis (Figure 5C–F). Moreover, time-dependent ROC analysis demonstrated that these markers had strong prognostic value at 1-, 3-, and 5-year intervals (Figure 5G–J).

Figure 5 Prognostic performance of the SVM for HCC (A) Risk factor plot within risk subgroups. (B) Compare gene expression levels among risk subgroups. (C-F) Kaplan–Meier survival curves for predicted value (C), HSP90AA1 (D), S100A10 (E), and TMSB10 (F). (GJ) Time-dependent ROC curves for predicted value (G), HSP90AA1 (H), S100A10 (I), and TMSB10 (J). ***p < 0.001.

Establishment of a Nomogram Integrating with Clinical Characteristics

Clinical characteristics from the TCGA-LIHC dataset were integrated with the predicted values of the SVM model for multivariate Cox regression analysis. Based on this analysis, a nomogram was constructed to estimate the survival probability of patients with HCC (Figure 6A). Among all the included variables, the predicted value of SVM demonstrated the strongest prognostic value, while the effect of gender was relatively weak. The calibration curve showed that the nomogram exhibited optimal clinical predictive performance at the 5-year (Figure 6B). Additionally, DCA demonstrated that the net clinical benefit of the prognostic model followed the pattern: 5-year > 3-year > 1-year (Figure 6C–E).

Figure 6 Establishment and evaluation of the nomogram (A) Nomogram integrating predicted value with clinical characteristics. (B) Calibration curves evaluating 1-, 3-, and 5-year survival predictions. (C–E) DCA of the prognostic model at 1-year (C), 3-year (D), and 5-year (E).

Underlying Molecular Mechanisms of IDMS

To investigate the biological processes underlying the prognostic relevance of IDMS, we performed GSEA and GSVA using transcriptomic data. The results revealed that oncogenic signaling pathways—including PI3K_AKT_MTOR, NOTCH, and WNT_BETA—were significantly activated in the high-risk group (Figure 7A–E and Table S1). In contrast, the low-risk group exhibited marked alterations in bile acid and fatty acid metabolism (Figure 7F and G and Table S2). These findings indicate a strong association between HCC risk stratification and dysregulated biological processes and metabolic pathways relevant to cancer progression.

Figure 7 Transcriptome characteristics of the subgroups (A) Ridge plot of biological functions identified by GSEA. (BE) GSEA enrichment plots showing upregulation of hypoxia metagene (B), glycolysis/gluconeogenesis (C), fatty acid metabolism (D), and CDKN1A-mediated apoptosis via TP53 (E).(F and G) Heatmap (F) and box plot (G) showing GSVA results comparing risk subgroups.

Analysis of Mutation and Immune Characteristics in Different Subgroups

SMs in IDMS were analyzed and visualized using “maftools” (Figure 8A). The results showed two types of SMs, with missense mutations being the most prevalent. In addition, single nucleotide polymorphisms represented the predominant mutation type, and C >T transitions were the most frequently observed single nucleotide variants in HCC samples. CNVs were present in all IDMS-related genes in the TCGA-LIHC samples (Figure 8B and C). Among them, S100A10 showed the highest frequency of CNV amplification and the lowest frequency of CNV deletion, particularly in the variant group compared to the non-variant group. We further evaluated immunotherapy sensitivity in HCC samples by comparing risk groups classified by SVM model. No statistically significant were observed in TIDE scores or TMB among risk subgroups. In contrast, the MSI score was significantly higher in the high-risk group (Figure 8D and F). MSI is an established biomarker for predicting tumor response to immunological therapy.31 These findings indicate that patients in the high-risk group might have a greater likelihood of benefiting from immunotherapeutic strategies.

Figure 8 Mutation and immune characteristics in IDMS and its subgroups. (A) SM landscape of IDMS in the TCGA-LIHC. (B and C) CNV profiles of IDMS in TCGA-LIHC. (DF) Comparison of immunotherapy response indicators between high- and low-risk groups: (D) TIDE scores, (E) MSI scores, and (F) TMB scores.

The Correlation Between the IDMS and Single-Cell Characteristics

A large amount of evidence shows an inseparable connection between the immune microen- vironment and HCC. In this study, the CIBERSORT algorithm was employed to quantify the abundance of 22 immune cell types and to compare their distribution across risk subgroups (Figure 9A). Additionally, a more in-depth analysis of immune cell infiltration uncovered intricate correlations among various immune cell populations. For instance, Plasma cells are positively correlated with naïve B cells, whereas monocytes are negatively correlated with follicular helper T cells (Figure 9B). Interestingly, Tregs exhibited a significant negative correlation with NR4A1, while displaying significant positive correlations with S100A10 and TMSB10. Moreover, HSP90AA1 demonstrated a significant negative association with M1-type macrophages, suggesting a potential role of IDMS in immune tolerance mechanisms (Figure 9C).

Figure 9 The correlation between the IDMS and single-cell characteristics (A) Stacked bar chart showing the distribution of immune cell types across risk subgroups. (B) Heatmap illustrating correlations among immune cell infiltration levels. (C) Heatmap showing correlations between immune cell abundance and IDMS expression levels.*p <0.05.

Analysis of Drug Sensitivity and Molecular Docking

To predict potential therapeutic agents for high-risk patients, we carried out a comprehensive drug sensitivity analysis using DGIdb and the DrugBank database to screen for drugs or small-molecule compounds interacting with IDMS (Figure 10A and Table 5). In addition, candidate therapeutic compounds and their target genes were evaluated using the Connectivity Map (CMap) (Figure 10B and Table 6). Based on compound susceptibility scores, the top two drugs were selected for molecular docking. The results indicated that HSP90AA1 exhibited moderate binding affinity with geldanamycin (Figure 10C). This interaction involved amino acid residues TYR493, GLU527, VAL530, GLN531, LEU533, LYS534, THR540, LEU541, VAL542, SER543, LYS546, THR603, TYR604, GLY605, TRP606, THR607, and MET610, which engaged in hydrogen bonding, weak hydrogen bonding, and hydrophobic interactions. Furthermore, HSP90AA1 also demonstrated moderate binding affinity with alvespimycin (Figure 10D). The interacting residues included TYR493, VAL530, GLN531, LYS534, THR540, LEU541, VAL542, SER543, LYS546, THR603, TYR604, and GLY605, forming similar hydrogen bonds and hydrophobic contacts. Finally, we evaluated the expression of IDMS in five cell lines and found HSP90AA1, NR4A1, and S100A10 significantly upregulated in tumor cell lines versus the liver cell line (Figure 10E–H).

Figure 10 Drug sensitivity and molecular docking analysis of IDMS (A) Interaction network between IDMS and candidate drugs or small-molecule compounds based on DGIdb and DrugBank analysis. (B) Predicted interactions between NR4A1, HSP90AA1, and candidate drugs based on CMap analysis. (CD) Molecular docking results of HSP90AA1 with geldanamycin and alvespimycin, showing overall docking structure and interaction strength diagrams from left to right. The protein is color-coded from green (hydrophilic) to red (hydrophobic) to reflect amino acid properties. Hydrogen bonds are indicated by dashed blue lines; weak hydrogen bonds by light blue dashed lines; and hydrophobic interactions by gray dashed lines. (E–H) Validation of mRNA expression levels of HSP90AA1 (E), NR4A1 (F), TMSB10 (G), and S100A10 (H) in one liver cell line (THLE-2) and four HCC cell lines (Hep-3B, BEL-7405, MHCC97H, and HCCLM3) via qRT-PCR.

Table 5 List of Drugs Related IDMS From DGIdb Database

Table 6 List of Drug-Related IDMS and Drug Susceptibility Scores in CMap Database

Discussion

HCC remains a major malignancy that significantly threatens human health. Previous studies investigating the molecular basis of HCC have identified numerous potential biomarkers for diagnosis and prevention.32–34 For instance, alpha-fetoprotein (AFP) is a commonly used biomarker for HCC diagnosis, however, its limited sensitivity and specificity restrict its clinical utility.35 In addition, vascular endothelial growth factor A has demonstrated potential for predicting the efficacy of sorafenib treatment.36 Promising molecular markers have also been identified in circulating tumor DNA (ctDNA) and circulating tumor cells.37 Nevertheless, due to a lack of large-scale prospective studies, robust molecular biomarkers for clinical decision-making in HCC remain limited. At present, pathological classification remains a primary tool for determining treatment strategies. However, substantial differences in patient prognosis are frequently observed under this model. Therefore, in the context of HCC—characterized by limited diagnostic and therapeutic options and marked heterogeneity—identifying reliable molecular biomarkers to support the implementation of PPPM is a critical and urgent need.

Advancements in multi-omics technologies have facilitated the elucidation of tumor molecular mechanisms.38 In this work, we integrated bulk and single-cell transcriptome datasets to explore potential biomarkers and characteristics of immune cell infiltration in HCC. To minimize bias stemming from a single algorithm, this study applied five machine learning algorithms with cross-validation to identify four immune-derived genes that exhibit greater stability and biological significance. Subsequently, we comprehensively analyzed the predictive performance of six machine learning approaches and selected the optimal risk model. Compared with the previous immune-related prognostic models constructed based on single-cell transcriptomes and bulk RNA-seq,39,40 through cross-validation with multiple algorithms and independent validation across multiple cohorts, immune-derived molecular features with higher generalization ability were obtained.More importantly, survival analysis and nomogram indicated that IDMS exhibits a robust capacity for risk stratification and prognostic value. This result indicates that IDMS not only has better predictive performance in the prognosis of HCC, but can also more effectively identify high-risk patients and guiding the development of targeted surveillance and intervention strategies. For high-risk individuals, clinicians may consider increasing monitoring frequency and recommending lifestyle interventions such as routine liver ultrasounds, smoking cessation, and reduced alcohol intake. Furthermore, detecting ctDNA can enhance the early diagnosis of HCC and prevents its occurrence.41

In the past few years, immunotherapy has become a major focus of basic and clinical oncology research, with promising advances in its application to HCC.42,43 However, challenges persist, including low objective response rates and adverse treatment effects.44 Therefore, identifying patients most likely to benefit from immunotherapy is a key strategy for improving outcomes and personalizing treatment plans. Multi-omics and drug sensitivity analysis in this study indicated that patients in the high-risk group were more likely to benefit, potentially reducing overall treatment costs. Moreover, geldanamycin and alvespimycin may exhibit superior therapeutic efficacy. Geldanamycin inhibits the function of HSP90AA1, leading to the degradation of multiple tumor-related proteins, thereby preventing the proliferation of tumor cells and inducing apoptosis.45,46 Alvespimycin inhibits tumor proliferation, metastasis and angiogenesis by suppressing pathways such as PI3K/AKT and MAPK, and enhances the sensitivity to chemotherapy.47–49Taken together, these findings suggest that IDMS are superior to previous markers in the prediction and personalized treatment of HCC, and are more conducive to the clinical management of PPPM for HCC patients.

Through the analysis of heterogeneity of IDMS at the Inter-cellular level, we found that all genes included in the IDMS are implicated in immune evasion or tumorigenesis. Tregs are a subset of T cells with immunosuppressive functions. Within the tumor microenvironment (TME) of HCC, persistent infiltration of Tregs promotes an immunosuppressive niche, thereby facilitating tumor progression and immune escape.18,50,51 NR4A1, a transcription factor, has been shown to promote the differentiation of naïve T cells into Tregs and enhance immunosuppression by modulating signaling pathways downstream of the T cell receptor (TCR).52,53 S100A10 has been reported by Wang et al to activate the cPLA2 and 5-LOX axis, leading to CD8+ T cell exhaustion in HCC and promoting sustained tumor progression and metastasis.54 TMSB10, a member of the β-thymosin family, is overexpressed in various cancers.55,56 Knockdown of TMSB10 has been shown to facilitate the polarization of macrophages towards the M1 phenotype, thereby improving antitumor immunity.57 HSP90AA1, a molecular chaperone, interacts with numerous client proteins and is involved in multiple biological processes related to cancer. Notably, plasma levels of HSP90AA1 are significantly elevated in HCC patients and exhibit superior diagnostic performance compared to AFP, with strong associations with tumor progression and metastasis.58,59 Importantly, both S100A10 and TMSB10 are positively correlated with Treg abundance and are highly expressed in HCC tissues, suggesting that they may play key roles in mediating Treg-associated immune escape mechanisms. However, the molecular mechanisms involved are not yet fully elucidated. Moreover, the potential biases that may arise from the publicly available data also require further investigation.

We investigated the underlying mechanisms of the IDMS using multi-omics analysis. DEGs between the risk subgroups were primarily correlated with cell signaling pathways—such as the PI3K, WNT, and MYC pathways—as well as metabolic processes including glycolysis, xenobiotic metabolism, and fatty acid metabolism. These results may partially explain the differences in prognosis observed between IDMS-defined subgroups. Furthermore, metabolism-associated biological pathways may represent promising directions for future research, and the development of targeted inhibitors against these pathways could offer therapeutic benefit in preventing HCC progression.

Nevertheless, there are certain limitations in this investigation. First of all, all data were sourced from publicly databases. Although the immune-derived molecular signature evaluated in this study demonstrated consistent prognostic predictive performance across multiple independent datasets, their clinical utility requires further validation in large-scale, prospective cohort studies. Additionally, our findings are derived from computational bioinformatics analyses, and functional experimental validation is lacking to confirm the biological roles of the genes comprising the IDMS. We hope that the publication of this study will attract broader research interest in the IDMS and contribute to a deeper understanding of its function in the pathogenesis and progression of HCC.

Conclusion

We identified an IDMS that shows potential as a valuable tool for predictive, preventive, and personalized medicine in HCC. Through multi-omics analysis, this study may offer novel perspectives on the molecular mechanisms that underlie the occurrence and development of HCC.

Data Sharing Statement

The raw data in this study are available from the following databases: TCGA database (https://portal.gdc.cancer.gov/), GEO database (http://www.ncbi.nlm.nih.gov/geo), ImmPort database (https://www.immport.org/), DGIdb (https://dgidb.org/), DrugBank database (https://go.drugbank.com/). The parameters used for machine learning model are listed in Table 4. If needed, please refer to the corresponding author for further details.

Ethical Approval

According to Article 32 of the Ethical Review Measures for Human Life Science and Medical Research in China (issued on February 18, 2023), the data for this study were obtained from the public databases and fall within the scope of ethical review exemption.

Consent to Participate

All authors volunteered to participate in this study.

Consent for Publication

All authors have provided their consent for the publication of this paper.

Funding

Funding for this research was provided by the Joint Funds for the Innovation of Science and Technology, Fujian province (No.2021Y9064) and Fujian Provincial Natural Science Foundation of China (No.2022J01730).

Disclosure

The authors report no conflicts of interest in this work.

References

1. Villanueva A. Hepatocellular Carcinoma N Engl J Med. 2019;380(15):1450–29.

2. McGiynn KA, Petrick JL, EL-Serag HB. Epidemiology of hepatocellular carcinoma. Hepatology. 2021;73:4–13.

3. Allemani C, Matsuda T, Di Carlo V, et al. Global surveillance of trends in cancer survival 2000-14 (Concord-3): analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population-based registries in 71 countries. Lancet. 2018;391(10125):1023–1075. doi:10.1016/S0140-6736(17)33326-3

4. Golabi P, Fazel S, Otgonsuren M, et al. Mortality assessment of patients with Hepatocellularcarcinoma according to underlying disease and treatment modalities. Medicine. 2017;96(9):e5904. doi:10.1097/MD.0000000000005904

5. Lu M, Zhan H, Liu B, et al. N6-methyladenosine-related non-coding RNAs are potential prognostic and immunotherapeutic responsiveness biomarkers for bladder cancer. EPMAJ. 2021;12(4):589–604. doi:10.1007/s13167-021-00259-w

6. He S, Qiao J, Wang L, et al. A novel immune-related gene signature predicts the prognosis of hepatocellular carcinoma. Front Oncol. 2022;12:955192. doi:10.3389/fonc.2022.955192

7. Xin C, Huang B, Chen M, et al. Construction and validation of an immune-related, LncRNA prognostic model for hepatocellular carcinoma. Cytokine. 2022;156:155923. doi:10.1016/j.cyto.2022.155923

8. Zhang G, Su L, Lv X, et al. A novel tumor doubling time-related immune gene signature for prognosis prediction in hepatocellular carcinoma. Cancer Cell Int. 2021;21(1):522. doi:10.1186/s12935-021-02227-w

9. Beumer BR, Buettner S, Galjart B, et al. Systematic review and meta-analysis of validated prognostic models for resected hepatocellular carcinoma patients. Eur J Surg Oncol. 2022;48(3):492–499. doi:10.1016/j.ejso.2021.09.012

10. Han Y, Song L, Lv L, et al. Unraveling the heterogeneity of tumor immune microenvironment in hepatocellular carcinoma by singlecell rna sequencing and its implications for prognosis and therapeutic response. Turk J Gastroenterol. 2024;35(12):876–888. doi:10.5152/tjg.2024.24118

11. Zucman-Rossi J, Villanueva A, Nault JC, et al. Genetic landscape and biomarkers ofhepatocellular carcinoma. Gastroenterology. 2015;149(5):1226–1239.e4. doi:10.1053/j.gastro.2015.05.061

12. Nakagawa H, Fujita M, Fujimoto A. Genome sequencing analysis of liver cancer for precision medicine. Semin Cancer Biol. 2019;55:120–127. doi:10.1016/j.semcancer.2018.03.004

13. Bingzhe L, Yunpeng W, Dongjiang M, et al. Immunotherapy: reshape the tumor immunemicroenvironment. Front Immunol. 2022;13:844142. doi:10.3389/fimmu.2022.844142

14. Qin X, Chen Y, Ma S, et al. Immune-related gene TM4SF18 could promote the metastasis of gastric cancer cells and predict the prognosis of gastric cancer patients. Mol Oncol. 2022;16(22):4043–4059. doi:10.1002/1878-0261.13321

15. Zhuang Y, Li S, Liu C, et al. Identification of an individualized immune-related prognostic risk score in lung squamous cell cancer. Front Oncol. 2021;11:546455. doi:10.3389/fonc.2021.546455

16. Chen H, Luo J, Guo J. Construction and validation of a 7-immune gene model for prognostic assessment of esophageal carcinoma. Med Sci Monit. 2020;26:e927392. doi:10.12659/MSM.927392

17. Zheng L, Qin S, Si W, et al. Pan-cancer single-cell landscape of tumor-infiltrating T cells. Science. 2021;374(6574):abe6474. doi:10.1126/science.abe6474

18. Zheng C, Zheng L, Yoo JK, et al. Landscape of infiltrating t cells in liver cancer revealed by single-cell sequencing. Cell. 2017;169(7):1342–1356. doi:10.1016/j.cell.2017.05.035

19. Obeid M, Tesniere A, Ghiringhelli F, et al. Calreticulin exposure dictates the immunogenicity of cancer cell death. Nat Med. 2007;13(1):54–61. doi:10.1038/nm1523

20. Yu WD, Sun G, Li J, et al. Mechanisms and therapeutic potentials of cancer immunotherapy in combination with radiotherapy and/or chemotherapy. Cancer Lett. 2019;28(452):66–70. doi:10.1016/j.canlet.2019.02.048

21. Salas-Benito D, Pérez-Gracia JL, Ponz-Sarvisé M, et al. Paradigms on immunotherapy combinations with chemotherapy. Cancer Discov. 2021;11(6):1353–1367. doi:10.1158/2159-8290.CD-20-1312

22. Feng Q, Huang Z, Song L, et al. Combining bulk and single-cell RNA-sequencing data todevelop an NK cell-related prognostic signature for hepatocellular carcinoma based onan integrated machine learning framework. Eur J Med Res. 2023;28(1):306. doi:10.1186/s40001-023-01300-6

23. Roessler S, Jia HL, Budhu A, et al. A unique metastasis gene signature enables prediction of tumor relapse in early-stage hepatocellular carcinoma patients. Cancer Res. 2010;70(24):10202–10212. doi:10.1158/0008-5472.CAN-10-2607

24. Grinchuk OV, Yenamandra SP, Iyer R, et al. Tumor-adjacent tissue co-expression profile analysis reveals pro-oncogenic ribosomal gene signature for prognosis of resectable hepatocellular carcinoma. Mol Oncol. 2018;12(1):89–113. doi:10.1002/1878-0261.12153

25. Wang SM, Ooi LL, Hui KM. Identification and validation of a novel gene signature associated with the recurrence of human hepatocellular carcinoma. Clin Cancer Res. 2007;13(21):6275–6283. doi:10.1158/1078-0432.CCR-06-2236

26. Langfelder P, Horvath S. WGCNA:an R package for weighted correlation network analysis. BMC Bioinf. 2008;29(9):559. doi:10.1186/1471-2105-9-559

27. Mermel CH, Schumacher SE, Hill B, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 2011;12(4):R41. doi:10.1186/gb-2011-12-4-r41

28. Liu GM, Zeng HD, Zhang CY, et al. Identification of a six-gene signature predicting overall survival for hepatocellular carcinoma. Cancer Cell Int. 2019;19:138. doi:10.1186/s12935-019-0858-2

29. Zhou Y, Li X, Long G, et al. Identification and validation of a tyrosine metabolism-related prognostic prediction model and characterization of the tumor microenvironmentinfiltration in hepatocellular carcinoma. Front Immunol. 2022;13:994259. doi:10.3389/fimmu.2022.994259

30. Shen B, Zhang G, Liu Y, et al. Identification and analysis of immune-related gene signature in hepatocellular carcinoma. Genes. 2022;13(10):1834. doi:10.3390/genes13101834

31. Yamamoto H, Watanabe Y, Arai H, et al. Microsatellite instability: a 2024 update. Cancer Sci. 2024;115(6):1738–1748. doi:10.1111/cas.16160

32. Wang D, Zhang L, Sun Z, et al. A radiomics signature associated with underlying gene expression pattern for the prediction of prognosis and treatment response in hepatocellular carcinoma. Eur J Radiol. 2023;167:111086. doi:10.1016/j.ejrad.2023.111086

33. Wang Y, Deng B. Hepatocellular carcinoma: molecular mechanism, targeted therapy, and biomarkers. Cancer Metastasis Rev. 2023;42(3):629–652. doi:10.1007/s10555-023-10084-4

34. Zhang ZM, Tan JX, Wang F, et al. Early diagnosis of hepatocellular carcinoma using machine learning method. Front Bioeng Biotechnol. 2020;8:254. doi:10.3389/fbioe.2020.00254

35. Ma S, Chan KW, Hu L, et al. Identification and characterizationof tumorigenic liver cancer stem/progenitor cells. Gastroenterology. 2007;132(7):2542–2556. doi:10.1053/j.gastro.2007.04.025

36. Horwitz E, Stein I, Andreozzi M, et al. Human and mouse VEGFA-amplified hepatocellular carcinomas are highly sensitive to sorafenib treatment. Cancer Discov. 2014;4(6):730–743. doi:10.1158/2159-8290.CD-13-0782

37. Johnson P, Zhou Q, Dao DY, et al. Circulating biomarkers in the diagnosis and management of hepatocellular carcinoma. Nat Rev Gastroenterol Hepatol. 2022;19(10):670–681. doi:10.1038/s41575-022-00620-y

38. Lu M, Zhan X. The crucial role of multiomic approach in cancer research and clinically relevant outcomes. EPMAJ. 2018;9(1):77–102.

39. Li B, Zeng T, Chen C, et al. Unraveling the potential mechanism and prognostic value of pentose phosphate pathway in hepatocellular carcinoma: a comprehensive analysis integrating bulk transcriptomics and single-cell sequencing data. Funct Integr Genomics. 2025;25(1):11. doi:10.1007/s10142-024-01521-w

40. Chen Q, Zhang C, Meng T, et al. Prediction of clinical prognosis and drug sensitivity in hepatocellular carcinoma through the combination of multiple cell death pathways. Cell Biol Int. 2024;48(12):1816–1835. doi:10.1002/cbin.12235

41. Ye Q, Ling S, Zheng S, et al. Liquid biopsy in hepatocellular carcinoma: circulating tumor cells and circulating tumor DNA. Mol Cancer. 2019;18(1):114. doi:10.1186/s12943-019-1043-x

42. Shimizu Y, Suzuki T, Yoshikawa T, et al. Next-generation cancer immunotherapy targeting glypican-3. Front Oncol. 2019;9:248. doi:10.3389/fonc.2019.00248

43. Obeid JM, Kunk PR, Zaydfudim VM, et al. Immunotherapy for hepatocellular carcinoma patients: is it ready for prime time? Cancer Immunol Immunother. 2018;67(2):161–174. doi:10.1007/s00262-017-2082-z

44. Pinter M, Jain RK, Duda DG. The current landscape of immune checkpoint blockade in hepatocellular carcinoma: a review. JAMA Oncol. 2021;7(1):113–123. doi:10.1001/jamaoncol.2020.3381

45. Talaei S, Mellatyar H, Asadi A, et al. Spotlight on 17-AAG 17-AAGas an Hsp90 inhibitor for molecular targeted cancer treatment. Chem Biol Drug Des. 2019;93(5):760–786. doi:10.1111/cbdd.13486

46. Wang Z, Fan L, Xu H, et al. HSP90AA1 is an unfavorable prognostic factor for hepatocellular carcinoma and contributes to tumorigenesis and chemotherapy resistance. Transl Oncol. 2024;50:102148. doi:10.1016/j.tranon.2024.102148

47. Wang Y, Li S, Ren T, et al. Mechanism of emodin in treating hepatitis B virus-associated hepatocellular carcinoma: network pharmacology and cell experiments. Front Cell Infect Microbiol. 2024;14:1458913. doi:10.3389/fcimb.2024.1458913

48. Liu Z, Zhang H, Yao J. Metabolomic profiling and network toxicology: mechanistic. insights into effect of gossypol acetate isomers in uterine fibroids and liver injury. Pharmaceuticals. 2024;17(10):1363. doi:10.3390/ph17101363

49. Sadaqat M, Qasim M, Tahir Ul Qamar M, et al. Advanced network pharmacology study reveals multi-pathway and multi-gene regulatory molecular mechanism of Bacopa monnieri in liver cancer based on data mining, molecular modeling, and microarray data analysis. Comput Biol Med. 2023;161:107059. doi:10.1016/j.compbiomed.2023.107059

50. Wang H, Zhang H, Wang Y, et al. Regulatory T cell and neutrophil extracellular trap. interaction contributes to carcinogenesis in non-alcoholic steatohepatitis. J Hepatol. 2021;75(6):1271–1283. doi:10.1016/j.jhep.2021.07.032

51. Gao Y, You M, Fu J, et al. Intratumoral stem-like CCR4+ regulatory T cells orchestrate the immunosuppressive microenvironment in HCC associated with hepatitis B. J Hepatol. 2022;76(1):148–159. doi:10.1016/j.jhep.2021.08.029

52. You M, Gao Y, Fu J, et al. Epigenetic regulation of HBV-specific tumor-infiltrating T cells in HBV-related. HCC.Hepatology. 2023;78(3):943–958. doi:10.1097/HEP.0000000000000369

53. Hiwa R, Brooks JF, Mueller JL, et al. NR4A nuclear receptors in T and B lymphocytes: gatekeepers of immune tolerance. Immunol Rev. 2022;307(1):116–133. doi:10.1111/imr.13072

54. Wang G, Shen X, Jin W, et al. Elucidating the role of S100A10 in CD8+ T cell exhaustion and HCC immune escape via the cPLA2 and 5-LOX axis. Cell Death Dis. 2024;15(8):573. doi:10.1038/s41419-024-06895-0

55. Zhang X, Ren D, Guo L, et al. Thymosin beta 10 is a key regulator of tumorigenesisand metastasis and a novel serum marker in breast cancer. Breast Cancer Res. 2017;19(1):15. doi:10.1186/s13058-016-0785-2

56. Lee SM, Na YK, Hong HS, et al. Hypomethylation of the thymosin β(10) gene is not associated with its overexpression in non-small cell lung cancer. Mol Cells. 2011;32(4):343–348. doi:10.1007/s10059-011-0073-z

57. Zeng J, Yang X, Yang L, et al. Thymosin β10 promotes tumor-associated macrophages M2 conversion and proliferation via the PI3K/Akt pathway in lung adenocarcinoma. Respir Res. 2020;21(1):328. doi:10.1186/s12931-020-01587-7

58. Fu Y, Xu X, Huang D, et al. Plasma heat shock protein 90alpha as a biomarker for the diagnosis of liver cancer: an official, large-scale, and multicenter clinical trial. EBioMedicine. 2017;24:56–63. doi:10.1016/j.ebiom.2017.09.007

59. Shi W, Feng L, Dong S, et al. FBXL6 governs c-MYC to promote hepatocellular carcinoma through ubiquitination and stabilization of HSP90AA1. Cell Commun Signal. 2020;18(1):100. doi:10.1186/s12964-020-00604-y

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.