Back to Journals » Journal of Hepatocellular Carcinoma » Volume 13
A Clinlabomics-Based Machine Learning Model Accurately Differentiates Atypical Hepatocellular Carcinoma from Atypical Benign Focal Hepatic Lesion
Authors Luo QQ
, Guo DF, Li QN, Liu MS
, Xu L, Zhang KH
, Wang T
Received 31 January 2026
Accepted for publication 21 April 2026
Published 6 May 2026 Volume 2026:13 596766
DOI https://doi.org/10.2147/JHC.S596766
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 2
Editor who approved publication: Prof. Dr. Imam Waked
Qing-Qing Luo,* Ding-Fan Guo,* Qiao-Nan Li, Mao-Sheng Liu, Lu Xu, Kun-He Zhang, Ting Wang
Department of Gastroenterology, Jiangxi Provincial Key Laboratory of Digestive Diseases, Jiangxi Clinical Research Center for Gastroenterology, Digestive Disease Hospital, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi, People’s Republic of China
*These authors contributed equally to this work
Correspondence: Kun-He Zhang; Ting Wang, Department of Gastroenterology, Jiangxi Provincial Key Laboratory of Digestive Diseases, Jiangxi Clinical Research Center for Gastroenterology, Digestive Disease Hospital, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, No. 17, Yongwai Zheng Street, Nanchang, Jiangxi, People’s Republic of China, Email [email protected]; [email protected]
Purpose: The differential diagnosis of radiological image atypical hepatocellular carcinoma (aHCC) and atypical benign focal hepatic lesion (aBFHL) is a challenge. We aimed to develop a diagnostic model based on the new concept of clinlabomics to address this challenge.
Patients and Methods: Pathologically diagnosed 466 patients (252 aHCC and 214 aBFHL) and their clinlabomic data were retrospectively collected. The patients were split into two sets based on admission time for training and testing models to differentiate aHCC from aBFHL. The models were developed using the three best-performing algorithms and key features selected from 18 clinlabomic indicators. The best model was validated and evaluated for classification ability by receiver operating characteristic (ROC) curve, fitness by calibration curve, and clinical utility by decision curve. Model interpretability was analyzed through SHapley Additive exPlanations, and model application was realized via an easy-to-use online calculator.
Results: Random forest (RF), support vector machine, and linear discriminant analysis were the top three algorithms. Six features (hepatitis B surface antigen, sex, alpha-fetoprotein, aspartate aminotransferase, platelets, and age) were selected. Three models were developed using the selected algorithms and features, and the RF model was the optimal one, with an area under the ROC curve (AUC) of 0.954 and a diagnostic accuracy of 92.5% for the testing set. Notably, this model also outperformed for early-stage, small, and AFP-negative aHCCs, with AUCs of 0.976– 0.982 and accuracies of 93.2– 93.8%. The RF model performed well in terms of calibration, net benefit gain, interpretability, and application.
Conclusion: The clinlabomics-based diagnostic model is valuable in the differential diagnosis of various types of aHCC from aBFHL, including early-stage, small, and AFP-negative aHCCs.
Keywords: atypical hepatocellular carcinoma, atypical focal hepatic lesion, diagnostic model, early diagnosis, clinlabomics
Introduction
Focal hepatic lesion (FHL) is a frequent clinical condition that can be classified as benign or malignant. The most common solid benign FHL is cavernous hemangioma of the liver (CHL), followed by focal nodular hyperplasia (FNH),1 and the most common solid malignant FHL is hepatocellular carcinoma (HCC), accounting for 90% of malignant lesions.2 The increasing use of imaging modalities, including ultrasound (US), computed tomography (CT), and magnetic resonance imaging (MRI), has significantly increased the number of FHLs being detected, and therefore, the differential diagnosis between benign and malignant FHLs has become a crucial clinical concern.3
Liver imaging is essential for the diagnosis of FHL. US is the most widely used imaging modality for the detection of FHL, but more than 50% of solid FHLs detected by US require contrast-enhanced CT/MRI scans for further diagnosis.4 A meta-analysis that compared the diagnostic performance of US, enhanced CT and MRI for FHL showed sensitivities of 85–87% and specificities of 82–89%.5 Although enhanced CT and MRI scans have better performance in differentiating malignant FHL from benign one, the high cost and side effects of contrast agents have somewhat limited their clinical application. More importantly, FHL without typical imaging features requires further biopsy-based pathological examination for a definitive diagnosis.6 However, liver biopsy is an invasive approach with risks of bleeding and tumor seeding and is not recommended for routine use in the diagnosis of FHL.7 Therefore, non-invasive diagnostic methods are needed to differentiate FHL without typical imaging features. Atypical imaging features were defined by radiologists in the Department of Radiology at the First Affiliated Hospital of Nanchang University, who collectively reviewed the enhancement patterns and other relevant morphological characteristics on contrast-enhanced CT or MRI scans, reached a consensus, and issued a formal report. Therefore, the atypical hepatic space-occupying lesions in this study corresponded to LR-2 (probably benign), LR-3 (intermediate probability of HCC), or LR-4 (probably HCC) according to the LI-RADS® (Liver Imaging Reporting and Data System), while excluding LR-1 (definitely benign) and LR-5 (definitely HCC).8
FHL without typical imaging features includes atypical HCC (aHCC) and atypical benign focal hepatic lesion (aBFHL).9 A recent meta-analysis showed that, according to two international standards, the sensitivity of imaging for liver nodules ranged from 52% to 74%, suggesting that nearly 26% to 48% of FHLs present with atypical imaging features and are difficult to diagnose definitively.10 These lesions therefore represent a substantial diagnostic challenge in daily clinical practice, often necessitating further invasive procedures such as biopsy. Serum α-fetoprotein (AFP) is widely used for the diagnosis of HCC in clinical practice, but its diagnostic performance is unsatisfactory, especially for small HCC, according to the guidelines of the Asian-Pacific Association for the Study of the Liver,11 with the sensitivity and specificity for the diagnosis of HCC smaller than 5 cm ranging from 0.04 to 0.31 and 0.76 to 1.0, respectively, at a cutoff value of 200 ng/mL, and 0.49 to 0.71 and 0.49 to 0.86, respectively, at a cutoff value of 20 ng/mL. This is due to the fact that AFP is elevated in only 60% to 70% of HCCs and in only 33% to 65% of small HCCs.7,12,13 In our previous study, the sensitivity and specificity of AFP for differentiating aHCC from aBFHL were 67.0% and 86.1%, respectively,14 suggesting that AFP has some value in the diagnosis of atypical FHL, but it is also unsatisfactory, although there have been no similar reports to validate our results.
Recently, a new concept, clinlabomics, has been introduced,15 which allows the combination of daily generated clinical laboratory data with artificial intelligence to establish new diagnostic approaches. The application of clinlabomics in the diagnosis and risk prediction of HCC has emerged. Luo et al16 developed a logistic regression model based on several hematological parameters to diagnose AFP-negative HCC with an area under the receiver operating characteristic (ROC) curve (AUC) of 0.922, sensitivity of 83.0%, and specificity of 93.1%. Kim et al17 developed a machine learning model based on routine blood tests to predict HCC risk in patients with chronic hepatitis B and showed a good performance (C-index 0.79). However, there are few reports on using clinlabomics to diagnose aHCC.
In our previous study, we developed the sAGP index, a combination of standardized AFP, GGT, and platelet count, which demonstrated good diagnostic performance for aHCC (AUC 0.905).14 However, this index had several limitations. First, its accuracy of diagnosis (approximately 83%) left room for improvement, particularly for early-stage and AFP-negative HCC. Second, the sAGP index was derived using four mathematical operations, and its performance may be inferior to diagnostic models developed using advanced machine learning algorithms. Third, the index lacked a user-friendly tool for clinical implementation. These limitations motivated us to explore whether machine learning-based models using a broader set of routine clinical indicators could achieve superior diagnostic performance and clinical utility. In the present study, we used machine-learning algorithms to develop and validate clinlabomics-based diagnostic models based on routine clinical and laboratory data for differentiating aHCC from aBFHL. To the best of our knowledge, this is the first report of machine learning models that use clinical data to diagnose atypical HCC.
Patients and Methods
Patients and Data Collection
Patients with atypical FHL hospitalized at the First Affiliated Hospital of Nanchang University from January 2015 to December 2021 were retrospectively collected from the Pathological Diagnosis System, and their pathological, imaging, and clinical records were reviewed. Patients with a definitive pathological diagnosis, pre-treatment serum AFP < 200 ng/mL, and an uncertain imaging diagnosis due to atypical imaging features were included in the study. Atypical imaging features were defined as described in the Introduction section. Correspondingly, the atypical hepatic space-occupying lesions in this study were classified as LR-2, LR-3, or LR-4 according to the LI-RADS®, while LR-1 and LR-5 lesions were excluded. And patients with any of the following conditions were excluded: (1) concomitant conditions that may interfere with laboratory blood test results; (2) received anti-cancer therapies that may interfere with laboratory blood test results; (3) missing required data, including but not limited to hepatitis B surface antigen (HBsAg) and contrast-enhanced CT or MRI scan. Eligible patients were divided into two groups, aHCC and aBFHL. The scheme of patient recruitment is shown in Figure 1.
|
Figure 1 The flowchart of patient enrollment. Abbreviations: FHL, focal hepatic lesion; aHCC, atypical hepatocellular carcinoma; aBFHL, atypical benign focal hepatic lesion. |
All 18 clinical and laboratory variables, including demographics (sex, age) and laboratory tests (blood cell analysis, blood biochemistry, hepatitis B virus [HBV] markers, and tumor markers), were completely extracted from the hospital’s electronic medical record system, with no missing data requiring imputation. All laboratory test results were from the first blood test after admission and before treatment. This study was approved by the Medical Research Ethics Committee of the First Affiliated Hospital of Nanchang University.
Establishment and Evaluation of Models to Differentiate aHCC from aBFHL
The workflow for model development and validation is illustrated in Figure 2. The R package “mlr3” was employed for model development.18 Patients admitted from January 2015 to December 2019 were designated as the training set, and other patients (admitted from January 2020 to December 2021) were designated as the testing set.
Utilizing clinlabomic data from the training set, nine state-of-the-art machine learning algorithms were used to develop classifiers to distinguish aHCC from aBFHL, in which a bootstrap resampling strategy was implemented, with the sample size set at 80% of the training set and repeated 100 times to ensure robustness. The performance of these classifiers was evaluated using metrics such as AUC, accuracy, and F1 score, and the top three algorithms based on the performance were selected for subsequent model training. Boruta19 and the least absolute shrinkage and selection operator (LASSO)20 algorithms were used for feature selection, and the selected features were used for model training. For the LASSO regression (performed using the glmnet package in R), the optimal penalty parameter λ was determined through 10-fold cross-validation on the training set, selecting the λ value that minimized the binomial deviance (λ.min). For the Boruta algorithm, we used the default settings of the Boruta package in R, with maxRuns = 100 to ensure stability of the feature selection process, and a p value threshold of 0.01 for confirming and rejecting features.
In the validation phase, the testing set was used to evaluate the trained models. Diagnostic performance was evaluated using the measures of diagnostic accuracy (AUC, sensitivity, specificity, accuracy, positive/negative predictive values, positive/negative likelihood ratios, and diagnostic odds ratio). The fit of the model was evaluated through calibration curves, and the clinical utility of the model was evaluated using decision curve analysis. The best performing model was selected as the final model, and its model interpretability was assessed using Shapley addictive explanation (SHAP) analysis.21 Ultimately, the final model was developed into an easy-to-use online calculator of HCC risk using the Shine Framework (https://shiny.posit.co/).
Statistical Analysis
All statistical analyses were conducted using R software version 4.3.1 and SPSS Statistics version 25.0 (IBM Corp., Armonk, NY, USA). The difference of clinlabomic indicators between the two groups was compared using Student’s t-test, Mann–Whitney U-test, or Pearson’s chi-squared test according to the type and distribution of variables. Confidence intervals (CIs) at 95% level were obtained for AUC, and other diagnostic performance metrics. P < 0.05 was considered statistically significant.
Results
Patient Characteristics
A total of 466 patients with atypical FHL were included in the present study, of which 252 cases were aHCC and 214 cases were aBFHL. All aHCC patients were staged according to the eighth version of the TNM staging system.22 The clinicopathological characteristics of these patients and the stage of aHCC were shown in Table 1.
|
Table 1 Clinicopathological Characteristics of Patients and the Stage of aHCC [n (%)] |
Characteristics and Diagnostic Values of Demographic and Clinlabomic Indicators
The demographic and laboratory characteristics and their diagnostic performance (AUC) were shown in Table 2. There were 10 indicators with an AUC of greater than or equal to 0.7, of which HBsAg, AST, and AFP showed the highest diagnostic value with AUC greater than 0.8. The normal rates of liver function tests and AFP ranged from 57.5% to 98.0% in the aHCC group and 75.2% to 99.1% in the aBFHL group (Figure 3). Although these indicators were normal in most patients of both groups, significant differences between two groups were observed in most indicators.
|
Table 2 Characteristics and Diagnostic Values of Demographic and Clinlabomic Indicators |
Development of Models on the Training Set
Patients were divided into training and testing sets according to the time of admission. The training set included 306 patients (166 aHCC and 140 aBFHL), and the testing set included 160 patients (86 aHCC and 74 aBFHL). Table 3 shows the demographic and clinlabomic characteristics of patients in the training and testing sets.
|
Table 3 Demographic and Clinlabomic Characteristics of Patients in Training and Testing Sets |
Based on the 18 clinlabomic indicators in the training set, nine state-of-the-art machine learning classifiers were developed to discriminate aHCC from aBFHL, and their classification performances were compared by AUC, accuracy (ACC) and F1 score (Figure 4A–C). Random forest (RF), support vector machine (SVM) and linear discriminant analysis (LDA) classifiers performed best, and therefore these three algorithms were selected for the subsequent model training. In the feature selection by Boruta and LASSO algorithms, HBsAg, sex, AFP, AST, PLT, and age were selected by both algorithms and therefore used as feature variables for subsequent model training (Figure 4D–F). Subsequently, we developed RF, SVM and LDA diagnostic models using the six clinlabomic indicators and determined the optimal hyperparameter values for each model using the grid search method with AUC as the optimization goal. The results of 10-fold cross-validation showed that the average AUC values of the three models reached 0.931–0.942 (Figure 4G–I). From a computational perspective, the three candidate models (RF, SVM, and LDA) were trained on a standard desktop computer (Intel Core i7, 16GB RAM) with total training times ranging from approximately 1.8 to 8.2 minutes under the 10-fold cross-validation framework. Importantly, once trained, the inference time for any of these models on a new patient case is near-instantaneous (< 0.1 second), making them highly feasible for real-time clinical deployment.
Validation of the Models on the Testing Set
The performance of the three models in discriminating aHCC from aBFHL was evaluated using the testing set. Confusion matrix analysis showed that the RF model accurately classified the highest number of cases among the three models (Figure 5A–C). The RF model also had the largest AUC in the ROC curve analysis (Figure 5D) and provided higher net gains than SVM and LDA at most decision thresholds in the decision curve analysis (Figure 5E). The calibration curves show that the SVM model has the best predictive agreement, while the RF model has the lowest Brier score, but with a small predictive bias (Figure 5F–H). Overall, the RF model is the best model and is therefore selected for further analysis.
Clinical Utility and SHAP Analysis of the RF Model
The clinical utility of the RF model, the best model, was evaluated, and the results showed that the model performed well in the differential diagnosis of early-stage aHCC (TNM stage I), small aHCC (tumor diameter < 3 cm), and AFP-negative aHCC (AFP < 20 ng/mL) from aBFHL (Figures 6A–C), outperforming our previously developed index sAGP in terms of AUC (0.976–0.982 vs. 0.839–0.922). Based on the probability of aHCC predicted by the RF model, good discrimination between aHCC and aBFHL was observed (Figure 6D). Furthermore, the diagnostic accuracy metrics of the RF model for the training set, testing set, early-stage aHCC, small aHCC, and AFP-negative aHCC were calculated and compared with the index sAGP (Table 4), and the results showed that the diagnostic accuracies were 92.5%-95.4% and the diagnostic odds ratios were 268.8–874.7, which were much higher than the sAGP. The impact of six feature variables on the predictive outcome of the RF model was investigated using SHAP analysis, and the results showed that male sex, positive HBsAg, and elevated levels of AFP, AST, and age contributed positively to the SHAP value, whereas elevated PLT levels contributed negatively to the SHAP values (Figure 6E). Of these six features, HBsAg and sex had the greatest impact on the average absolute SHAP values, indicating their key role in the model Figure 6F and G shows the SHAP value of each feature and its contribution to the prediction of aHCC risk in an individual patient. Finally, an easy-to-use online application of the RF model for calculating the probability of aHCC has been developed and is freely accessible via a cloud platform (https://dingfan.shinyapps.io/aHCC-ML/) (Figure 6H).
|
Table 4 Diagnostic Accuracy Metrics of the RF Model and sAGP |
|
Figure 6 Clinical utility and SHAP analysis of the random forest (RF) model. (A–C): Receiver operating characteristic curves for three subtypes of aHCC. (D): The scatterplot of individual probabilities of aHCC predicted by the RF model. (E): The summary plots of the SHAP analysis, which illustrate the distribution of SHAP values of each feature and the effect of feature value on SHAP value in the RF model. The color gradient from green to red indicates feature values from small to large; for binary features, green indicates negative (HBsAg) and female (sex), and red indicates positive (HBsAg) and male (sex). (F): The mean absolute SHAP value, which indicates the average impact of a feature on model output. (G): SHAP force plot illustrating the contribution of each feature to the model’s predicted probability of aHCC for an individual patient (Patient #109). The horizontal axis represents the predicted probability of aHCC. The base value E[f(x)]=0.547 denotes the average model prediction over the training set. Features shown in red push the prediction toward a higher probability of aHCC. The length of each red bar corresponds to the magnitude of that feature’s contribution. The final predicted probability f(x)=0.940 is displayed at the top. (H): The web application for the RF model to calculate individual aHCC risk and the online platform used for this analysis is available at: https://dingfan.shinyapps.io/aHCC-ML/. Abbreviations: AFP, alpha-fetoprotein; aHCC, atypical hepatocellular carcinoma; TNM, the tumor-node-metastasis staging system; RF, random forest model; AUC, area under the receiver operating characteristic curve; sAGP, the index [(standardized α-fetoprotein + standardized γ-glutamyltransferase)/standardized platelet]; HBsAg, hepatitis B surface antigen; AST, aspartate aminotransferase; PLT, platelet; SHAP, Shapley additive explanation. |
Discussion
The Clinical diagnosis of HCC is primarily based on typical imaging features. However, not all HCC patients present with the typical “fast-in and fast-out” imaging pattern, thereby posing a challenge in the differential diagnosis of HCC and benign FHL. To address this issue, we previously developed a non-invasive novel index, sAGP, from conventional laboratory tests (AFP, GGT, and PLT) and found that it was valuable for the diagnosis of aHCC, with AUCs greater than 0.85 for early-stage, small, and AFP-negative aHCC. This work encouraged us to continue researching for better results by developing a diagnostic model using clinlabomic data.
In the present study, new cases (2021 year) were collected and merged with the original cases (2015–2020 years) to form a new patient cohort, and for easy application, only clinical indicators that are always available in the medical records of these patients were selected as variables, ie. age and sex, blood cell analysis, liver disease-related tests (liver function, AFP, and HBV markers). Adhering to the standard procedures of machine learning modeling, the classifiers of nine state-of-the-art machine learning algorithms were developed and compared, and RF, SVM, and LDA had the best performance and were selected as the algorithms for further model training. On the other hand, we selected features using Boruta and LASSO methods and identified six crucial features: HBsAg, sex, AFP, AST, PLT, and age. Using the six features, we developed RF, SVM, and LDA models to discriminate aHCC from aBFHL, and the RF model showed the best diagnostic performance among the three models, which performed excellently on the training set, testing set, as well as early, small, and AFP-negative aHCC, with AUCs greater than 0.95 and ACCs greater than 90%, and approximately 10% increase in ACC over our previous index sAGP. Finally, we interpreted the RF model using SHAP analysis and developed an easy-to-use online application to calculate the risk probability of aHCC for an individual patient.
In the context of existing diagnostic approaches for aHCC, our proposed clinlabomics-based model represents a notable advancement in both diagnostic performance and clinical practicality. To date, apart from our previous study utilizing the sAGP index,14 there has been a paucity of research applying routine clinical and laboratory indicators specifically for aHCC diagnosis. The predominant strategies in the literature have been imaging-based, as summarized in Table 5. These include ultrasomics features from contrast-enhanced ultrasound (CEUS),23 deep learning analysis of multiphasic MRI,24 and spatio-temporal diagnostic semantics from CEUS video frames.25 While such methods have reported AUCs ranging from 0.86 to 0.93 and accuracies exceeding 94%, they are often constrained by reliance on specialized equipment, subjective interpretation, and limited practical deployment. For instance, Laroia et al26 demonstrated that both conventional qualitative imaging and the LI-RADSv2018 lexicon, while highly sensitive (92.0–97.0%), suffered from low specificity (30.0–55.5%) in diagnosing atypical HCC. Similarly, the ultrasomics model by Li et al23 achieved only moderate performance in differentiating aHCC from FNH, with an AUC of 0.86. In contrast, our model leverages six routine clinlabomic indicators (HBsAg, sex, AFP, AST, PLT, and age) to construct a RF classifier that achieves superior and stable performance, with AUCs ranging from 0.939 to 0.982 and accuracies between 92.5% and 95.4% across multiple clinically relevant subgroups. Beyond its robust performance, our model addresses critical translational gaps by offering both SHAP-based interpretability and a publicly accessible online calculator (https://dingfan.shinyapps.io/aHCC-ML/), enabling real-time, individualized risk prediction. To our knowledge, this is the first diagnostic model specifically tailored for imaging-atypical HCC using routine clinlabomic data, offering a cost-effective, non-invasive, and easily implementable alternative to imaging-based methods.
|
Table 5 Overview of Machine and Deep Learning Approaches in Atypical HCC Diagnosis |
Six indicators were included in the diagnostic model: HBsAg, sex, AFP, AST, PLT, and age. Although sex and age are the most basic clinical indicators, they stood out from 18 indicators to be among the six variables in the model. The univariate analysis showed that the proportion of male patients was much higher in the aHCC group than in the aBFHL group (89.3% vs. 33.2%), and the mean age of the aHCC patients was approximately 10 years older than that of the aBFHL patients (56.3 vs. 45.8 years). The SHAP analysis also yielded results consistent with the conventional analysis above and showed that sex was more important than age. It is well known that men are more susceptible to HCC than women, with morbidity and mortality rates two to three times higher in men than in women in most regions, and HCC is the leading cause of cancer death in men in some countries.27 Aging plays an important role in tumorigenesis, and 60% of cancers are found in elderly patients. HCC is relatively rare in the first 40 years of life, and the average age at the diagnosis of HCC is 55–59 years in China and 63–65 years in Europe and North America.28
In this study, HBsAg was the feature with the largest SHAP value in the RF model and was also one of three indicators with AUC greater than 0.8 for differentiating aHCC from aBFHL in the univariate analysis. The HBsAg positive rate in the aHCC group was four times higher than that in the aBFHL group (86.1% vs. 21.5%), and its AUC was even higher than AFP (0.823 vs. 0.811) in univariate analysis. HBV infection can lead to chronic inflammation of the liver followed by the development of HCC. Tseng et al29 followed up 2688 HBsAg-positive patients without cirrhosis for a mean of 14.7 years and found that the baseline HBsAg level was significantly associated with the development of HCC. A meta-analysis including 10 studies (N = 12541) exhibited that HBsAg levels ≥ 100 IU/mL, especially ≥ 1000 IU/mL, were associated with an increased risk of HCC development.30 HBsAg status is also associated with the specificity of AFP in the diagnosis of HCC.31 However, in countries or regions where HBV is not the primary cause of HCC, HBsAg may not be a significant variable in the differential diagnosis of aHCC from aBFHL and needs to be validated.
Serum AFP was one of the six variables in the diagnostic model. AFP is the most widely used tumor marker for the detection of HCC, but it is currently not recommended for the diagnosis of HCC due to its poor diagnostic performance in small HCC. The Asian-Pacific guidelines11 recommend AFP in combination with ultrasound for HCC surveillance at a cutoff of 200 ng/mL. In the present study, HCC patients with AFP < 200 ng/mL were defined as “atypical” and used together with atypical imaging to define “aHCC”. Indeed, the AFP cutoff of 200 ng/mL has important implications for the diagnosis of HCC. In a study with a large sample of HCC patients with AFP < 200 ng/mL, the negative predictive value of ultrasound-guided percutaneous fine-needle aspiration (FNA) and pathological examination was only about 60%,32 indicating the high risk of missed diagnosis. AFP-L3 and protein induced by vitamin K deficiency or antagonist II (PIVKA-II), as complementary markers for AFP, also had lower sensitivities for HCC with AFP < 200 ng/mL (56.9% and 62.1%, respectively).33 However, AFP was important in the differential diagnosis of aHCC in the present study. Although AFP levels were low in both groups and normal in almost all patients in the aBFHL group and almost 50% of patients in the aHCC group, the median AFP level in the aHCC group was almost four times higher than that in the aBFHL group (8.7 vs. 2.4 ng/mL), with good performance in differentiating aHCC from aBFHL (AUC = 0.811). In addition, the SHAP value of AFP was among the top three in the RF model. Therefore, the diagnostic value of AFP for aHCC should not be ignored.
AST, another variable in the diagnostic model, is a reliable and sensitive marker of liver injury. In the present study, the level of AST was normal in the majority of aHCC patients (61.9%) and aBFHL patients (81.8%), but it differed significantly between the two groups and had a good diagnostic performance in univariate analysis (ranked top 2 among the 18 indicators) and a high SHAP value similar to AFP in the RF model, suggesting that AST is a valuable indicator for the differential diagnosis of aHCC. Previous reports have demonstrated that AST in combination with other liver tests could improve the diagnostic performance in HCC. The γ-GT/AST ratio had an AUC of 0.779 for the diagnosis of HCC, and the AUC increased to 0.925 and 0.837 when combined with PIVKA-II and AFP, respectively.34 ALT/AST had an AUC of 0.804 for the diagnosis of AFP-negative HCC.35 Ioannou et al36 also found that AST/∏ALT ratio was an independent predictor of HCC development in cirrhosis of all etiologies.
PLT was also a feature included in the diagnostic model, with an AUC of 0.776 for distinguishing aHCC from aBFHL in univariate analysis. The level of PLT was lower in aHCC than in aBFHL in our study, which may be related to the fact that most aHCC had the background of liver cirrhosis or were at an early stage with low levels of tumor-educated PLT. PLT plays an important role in cancer development and progression and has value in diagnosis, prognosis, and monitoring response to therapy, suggesting that tumor cells have direct and indirect interactions with tumor-educated PLT.37 A multivariable logistic model constructed with clinical and hematological variables including PLT showed good performance in discriminating AFP-negative and small-sized HCC from benign liver disease and healthy controls.38
The interpretability of a machine learning and artificial intelligence system is crucial for the predicted outcome to be trusted in medical and healthcare practice. In this study, we interpreted the RF model through SHAP analysis, which clearly showed the magnitude of the contribution of the six variables to the model output through SHAP values, and demonstrated the process of the model in predicting the probability of aHCC for a specific patient through the force plot. To facilitate the application of the RF model, we developed an online calculator of the model, which can easily calculate the probability of aHCC for a patient. SHAP analysis has been used to explain the relative importance of predictor variables in a model to promote personalized medicine, such as risk prediction for chronic kidney disease,39 identification of perineural invasion for intrahepatic cholangiocarcinoma,40 prognostic evaluation for spinal cord glioma.41 The combination of SHAP analysis and online calculator development is a perfect approach for model interpretation and clinical application, providing clear explanations for personalized prediction and a more intuitive understanding of the effect of key features in the models. From a practical standpoint, the computational cost of our model is minimal. While training required a few minutes, the inference time per patient is negligible, and the model has been deployed as a freely accessible online calculator. This ensures that, despite its machine learning foundation, the tool can be easily integrated into routine clinical workflows without requiring specialized hardware or software.
In univariate analysis, we found that the majority of patients had normal liver function test results (the mean normal rate of 10 liver indicators was 70.2% in the aHCC group and 85.2% in the aBFHL group). Even when results were abnormal, the magnitude was limited. For example, AST levels greater than twice the upper limit of normal were only 9.9% in the aHCC group and 1.9% in the aBFHL group (data not shown). Therefore, from a clinical point of view, these liver function indicators are not useful in the diagnosis of aHCC. However, most of these indicators were valuable in distinguishing aHCC from aBFHL (8 of 10 indicators had an AUC > 0.6 and 5 of them had an AUC > 0.7), and the RF model established with six conventional clinical indicators performed well in distinguishing aHCC from aBFHL, suggesting that clinical laboratory indicators at normal values may have good diagnostic power, especially when used in combination. These results illustrate the diagnostic advantages of the new concept of clinlabomics and suggest that the promotion of clinlabomics may lead to new opportunities in the development of diagnostics.
Despite the promising findings, several limitations of this study should be acknowledged. First, the diagnostic model relies heavily on HBsAg status, which emerged as the most influential predictor in our SHAP analysis. As our cohort was derived from an HBV-endemic region where HBsAg positivity constitutes a dominant risk factor for hepatocellular carcinoma, the model’s performance may not directly extrapolate to populations with different etiological profiles, such as those where hepatitis C virus (HCV) infection, alcohol-related liver disease, or non-alcoholic fatty liver disease (NAFLD) are more prevalent. Consequently, careful validation and potential recalibration of the model are necessary in non-HBV predominant cohorts before it can be considered for broader clinical application. Second, the single-center and retrospective design of this study introduces the potential for selection bias and limits the generalizability of our findings. The absence of external validation using an independent, multicenter cohort represents a significant limitation. Therefore, well-designed prospective studies across multiple centers are essential to confirm the robustness, reproducibility, and true clinical utility of our model across diverse populations and clinical settings.
Conclusions
We have developed and validated a diagnostic model for the differential diagnosis of aHCC from aBFHL using routine clinical and laboratory indicators and following the novel concept of clinlabomics. To our knowledge, this is the first diagnostic model specifically developed for atypical hepatocellular carcinoma using routine clinical and laboratory indicators. The model performed well in discriminating between aHCC, including early, small and AFP-negative aHCC, from aBFHL in our HBV-endemic cohort. However, given the model’s heavy reliance on HBsAg status, validation and potential recalibration in non-HBV cohorts (eg., NAFLD or HCV populations) are essential prerequisites before any broader clinical application can be considered. Furthermore, prospective multicenter validation is a prerequisite for clinical implementation of this model, given its single-center derivation and the exceptionally high diagnostic performance observed.
Data Sharing Statement
The datasets generated and/or analyzed during the current study are not publicly available due to containing information that could compromise research participant privacy but are available from the corresponding author (Prof. Zhang Kun-he) on reasonable request and with permission of the local Institutional Review Board.
Ethics Approval Statement
The study protocol conformed to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Medical Ethics Committee of the First Affiliated Hospital of Nanchang University, and the ethics approval number is (2025) CDYFYYLK (10-032). As this study involved only the collection and analysis of clinical data, with no biological sample collection or invasive interventions conducted at any point, there was no foreseeable risk of harm to the participants. Additionally, strict measures have been put in place to eliminate any risk of personal privacy being compromised. For these considerations, individual informed consent for the study was exempted by the Medical Ethics Committee of the First Affiliated Hospital of Nanchang University, in accordance with its ethical review guidelines.
Author Contributions
All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.
Funding
This study was supported by the Jiangxi Provincial Health Commission’s Science and Technology Plan (202410181) and the Jiangxi Provincial Natural Science Foundation (20252BAC240011). We thank the Key Laboratory Project of Digestive Diseases in Jiangxi Province (2024SSY06101), and Jiangxi Clinical Research Center for Gastroenterology (20223BCG74011) for providing support in terms of the platform and venue.
Disclosure
The authors report no conflicts of interest in this work. This paper has been uploaded to SSRN as a preprint:https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5044998.
References
1. Marrero JA, Ahn J, Rajender Reddy K, et al. ACG clinical guideline: the diagnosis and management of focal liver lesions. Am J Gastroenterol. 2014;109(9):1328–17. doi:10.1038/ajg.2014.213
2. Hennedige T, Venkatesh SK. Imaging of hepatocellular carcinoma: diagnosis, staging and treatment monitoring. Cancer Imaging. 2013;12(3):530–547. doi:10.1102/1470-7330.2012.0044
3. Zhao X, Liang P, Yong L, et al. Radiomics study for differentiating focal hepatic lesions based on unenhanced ct images. Front Oncol. 2022;12:650797. doi:10.3389/fonc.2022.650797
4. Rónaszéki AD, Dudás I, Zsély B, et al. Microvascular flow imaging to differentiate focal hepatic lesions: the spoke-wheel pattern as a specific sign of focal nodular hyperplasia. Ultrasonography. 2023;42(1):172–181. doi:10.14366/usg.22028
5. Xie L, Guang Y, Ding H, et al. Diagnostic value of contrast-enhanced ultrasound, computed tomography and magnetic resonance imaging for focal liver lesions: a meta-analysis. Ultrasound Med Biol. 2011;37(6):854–861. doi:10.1016/j.ultrasmedbio.2011.03.006
6. Yoon J, Park SH, Ahn SJ, et al. Atypical manifestation of primary hepatocellular carcinoma and hepatic malignancy mimicking lesions. J Korean Soc Radiol. 2022;83(4):808–829. doi:10.3348/jksr.2021.0178
7. Heimbach JK, Kulik LM, Finn RS, et al. AASLD guidelines for the treatment of hepatocellular carcinoma. Hepatology. 2018;67(1):358–380. doi:10.1002/hep.29086
8. Chernyak V, Fowler KJ, Kamaya A, et al. Liver imaging reporting and data system (li-rads) version 2018: imaging of hepatocellular carcinoma in at-risk patients. Radiology. 2018;289(3):816–830. doi:10.1148/radiol.2018181494
9. Kim JH, Joo I, Lee JM. Atypical appearance of hepatocellular carcinoma and its mimickers: how to solve challenging cases using gadoxetic acid-enhanced liver magnetic resonance imaging. Korean J Radiol. 2019;20(7):1019–1041. doi:10.3348/kjr.2018.0636
10. Shin J, Lee S, Yoon JK, et al. Diagnostic Performance of the 2018 EASL vs. LI-RADS for hepatocellular carcinoma using ct and mri: a systematic review and meta-analysis of comparative studies. J Magn Reson Imaging. 2023;58(6):1942–1950. doi:10.1002/jmri.28716
11. Omata M, Cheng AL, Kokudo N, et al. Asia-Pacific clinical practice guidelines on the management of hepatocellular carcinoma: a 2017 update. Hepatol Int. 2017;11(4):317–370. doi:10.1007/s12072-017-9799-9
12. Trevisani F, D’Intino PE, Morselli-Labate AM, et al. Serum alpha-fetoprotein for diagnosis of hepatocellular carcinoma in patients with chronic liver disease: influence of HBsAg and anti-HCV status. J Hepatol. 2001;34(4):570–575. doi:10.1016/s0168-8278(00)00053-2
13. She S, Xiang Y, Yang M, et al. C-reactive protein is a biomarker of AFP-negative HBV-related hepatocellular carcinoma. Int J Oncol. 2015;47(2):543–554. doi:10.3892/ijo.2015.3042
14. Luo QQ, Li QN, Cai D, et al. The Index sAGP is valuable for distinguishing atypical hepatocellular carcinoma from atypical benign focal hepatic lesions. J Hepatocell Carcinoma. 2024;11:317–325. doi:10.2147/JHC.S443273
15. Wen X, Leng P, Wang J, et al. Clinlabomics: leveraging clinical laboratory data by data mining strategies. BMC Bioinf. 2022;23(1):387. doi:10.1186/s12859-022-04926-1
16. Luo CL, Rong Y, Chen H, et al. A logistic regression model for noninvasive prediction of afp-negative hepatocellular carcinoma. Technol Cancer Res Treat. 2019;18:1533033819846632. doi:10.1177/1533033819846632
17. Kim HY, Lampertico P, Nam JY, et al. An artificial intelligence model to predict hepatocellular carcinoma risk in Korean and Caucasian patients with chronic hepatitis B. J Hepatol. 2022;76(2):311–318. doi:10.1016/j.jhep.2021.09.025
18. Lang M, Binder M, Richter J, et al. mlr3: a modern object-oriented machine learning framework in R. J Open Source Software. 2019;4:1903. doi:10.21105/joss.01903
19. Kursa MB, Rudnicki WR. Feature Selection with the Boruta Package. J. Stat. Softw. 2010;36(11):1–13. doi:10.18637/jss.v036.i11
20. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 2010;33(1):1–22. doi:10.18637/jss.v033.i01
21. Lundberg SM, Nair B, Vavilala MS, et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng. 2018;2(10):749–760. doi:10.1038/s41551-018-0304-0
22. Lee CW, Tsai HI, Yu MC, et al. A proposal for T1 subclassification in hepatocellular carcinoma: reappraisal of the AJCC. Hepatol Int. 2022;16(6):1353–1367. doi:10.1007/s12072-022-10422-8
23. Li W, Lv XZ, Zheng X, et al. Machine learning-based ultrasomics improves the diagnostic performance in differentiating focal nodular hyperplasia and atypical hepatocellular carcinoma. Front Oncol. 2021;11:544979. doi:10.3389/fonc.2021.544979
24. Oestmann PM, Wang CJ, Savic LJ, et al. Deep learning-assisted differentiation of pathologically proven atypical and typical hepatocellular carcinoma (HCC) versus non-HCC on contrast-enhanced MRI of the liver. Eur Radiol. 2021;31(7):4981–4990. doi:10.1007/s00330-020-07559-1
25. Huang Q, Pan F, Li W, et al. Differential diagnosis of atypical hepatocellular carcinoma in contrast-enhanced ultrasound using spatio-temporal diagnostic semantics. IEEE J Biomed Health Inform. 2020;24(10):2860–2869. doi:10.1109/JBHI.2020.2977937
26. Laroia ST, Yadav K, Rastogi A, et al. Diagnostic efficacy of dynamic liver imaging using qualitative diagnostic algorithm versus LI-RADS v2018 lexicon for atypical versus classical HCC lesions: a decade of experience from a tertiary liver institute. Eur J Radiol Open. 2020;7:100219. doi:10.1016/j.ejro.2020.100219
27. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263. doi:10.3322/caac.21834
28. El-Serag HB. Epidemiology of viral hepatitis and hepatocellular carcinoma. Gastroenterology. 2012;142(6):1264–1273.e1. doi:10.1053/j.gastro.2011.12.061
29. Tseng TC, Liu CJ, Yang HC, et al. High levels of hepatitis B surface antigen increase risk of hepatocellular carcinoma in patients with low HBV load. Gastroenterology. 2012;142(5):1140–1149.e3. doi:10.1053/j.gastro.2012.02.007
30. T TV, Poovorawan K, Charoen P, et al. Association between hepatitis b surface antigen levels and the risk of hepatocellular carcinoma in patients with chronic hepatitis b infection: systematic review and meta-analysis. Asian Pac J Cancer Prev. 2019;20(8):2239–2246. doi:10.31557/APJCP.2019.20.8.2239
31. Lee HS, Chung YH, Kim CY. Specificities of serum alpha-fetoprotein in HBsAg+ and HBsAg- patients in the diagnosis of hepatocellular carcinoma. Hepatology. 1991;14(1):68–72. doi:10.1002/hep.1840140112
32. Chen QW, Cheng CS, Chen H, et al. Effectiveness and complications of ultrasound guided fine needle aspiration for primary liver cancer in a Chinese population with serum α-fetoprotein levels ≤200 ng/mL--a study based on 4,312 patients. PLoS One. 2014;9(8):e101536. doi:10.1371/journal.pone.0101536
33. Choi JY, Jung SW, Kim HY, et al. Diagnostic value of AFP-L3 and PIVKA-II in hepatocellular carcinoma according to total-AFP. World J Gastroenterol. 2013;19(3):339–346. doi:10.3748/wjg.v19.i3.339
34. Wang Q, Chen Q, Zhang X, et al. Diagnostic value of gamma-glutamyltransferase/aspartate aminotransferase ratio, protein induced by vitamin K absence or antagonist II, and alpha-fetoprotein in hepatitis B virus-related hepatocellular carcinoma. World J Gastroenterol. 2019;25(36):5515–5529. doi:10.3748/wjg.v25.i36.5515
35. Li J, Tao H, Zhang E, et al. Diagnostic value of gamma-glutamyl transpeptidase to alkaline phosphatase ratio combined with gamma-glutamyl transpeptidase to aspartate aminotransferase ratio and alanine aminotransferase to aspartate aminotransferase ratio in alpha-fetoprotein-negative hepatocellular carcinoma. Cancer Med. 2021;10(14):4844–4854. doi:10.1002/cam4.4057
36. Ioannou GN, Green P, Lowy E, et al. Differences in hepatocellular carcinoma risk, predictors and trends over time according to etiology of cirrhosis. PLoS One. 2018;13(9):e0204412. doi:10.1371/journal.pone.0204412
37. Li S, Lu Z, Wu S, et al. The dynamic role of platelets in cancer progression and their therapeutic implications. Nat Rev Cancer. 2024;24(1):72–87. doi:10.1038/s41568-023-00639-6
38. Yu Z, Chen D, Zheng Y, et al. Development and validation of a diagnostic model for AFP-negative hepatocellular carcinoma. J Cancer Res Clin Oncol. 2023;149(13):11295–11308. doi:10.1007/s00432-023-04997-4
39. Tsai MC, Lojanapiwat B, Chang CC, et al. Risk prediction model for chronic kidney disease in thailand using artificial intelligence and SHAP. Diagnostics. 2023;13(23):3548. doi:10.3390/diagnostics13233548
40. Liu Z, Luo C, Chen X, et al. Noninvasive prediction of perineural invasion in intrahepatic cholangiocarcinoma by clinicoradiological features and computed tomography radiomics based on interpretable machine learning: a multicenter cohort study. Int J Surg. 2024;110(2):1039–1051. doi:10.1097/JS9.0000000000000881
41. Karabacak M, Schupper AJ, Carr MT, et al. Development and internal validation of machine learning models for personalized survival predictions in spinal cord glioma patients. Spine J. 2024;24(6):1065–1076. doi:10.1016/j.spinee.2024.02.002
© 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The
full terms of this license are available at https://www.dovepress.com/terms
and incorporate the Creative Commons Attribution
- Non Commercial (unported, 4.0) License.
By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted
without any further permission from Dove Medical Press Limited, provided the work is properly
attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.
