Machine Learning-Derived Predictive Risk Score for Prediabetes and Type 2 Diabetes Development in Parous Women

Amélie Taschereau,¹ Jenna Wong,² Soren Harnois-Leblanc,² Marie A Brunet,^3,⁴ Myriam Doyon,⁴ Mélina Arguin,⁴ Sheryl L Rifas-Shiman,² Emily Oken,² Patrice Perron,^4,⁵ Pierre-Étienne Jacques,^1,^4,⁶ Luigi Bouchard,^4,^7,^8,^* Marie-France Hivert^2,^4,^9,^*

¹Département de Biologie, Université de Sherbrooke, Sherbrooke, QC, Canada; ²Department of Population Medicine, Harvard Pilgrim Health Care Institute, Boston, MA, USA; ³Department of Pediatrics, Université de Sherbrooke, Sherbrooke, QC, Canada; ⁴Centre de recherche du Centre hospitalier universitaire de Sherbrooke (CRCHUS), Sherbrooke, QC, J1H 5N3, Canada; ⁵Department of Medicine, University of Sherbrooke, Sherbrooke, QC, Canada; ⁶Institut de recherche sur le cancer de l’Université de Sherbrooke (IRCUS), Sherbrooke, QC, Canada; ⁷Department of Biochemistry and Functional Genomics, Université de Sherbrooke, Sherbrooke, QC, Canada; ⁸Department of Medical Biology, CIUSSS of Saguenay-Lac-Saint-Jean, Saguenay, QC, Canada; ⁹Department of Medicine, Massachusetts General Hospital, Boston, MA, USA

*These authors contributed equally to this work

Correspondence: Marie-France Hivert, Department of Population Medicine, Harvard Medical School, Harvard Pilgrim Health Care Institute, Boston, MA, USA, Tel +1 617-580-1487, Email [email protected]

Background/Aims: Pregnancy is a window of opportunity for closer links with clinical care, and to identify women at risk of chronic disease. Because of the elevated risk of type 2 diabetes (T2D) associated with gestational diabetes mellitus (GDM), most existing prediction models for post-delivery T2D focus on women with GDM, leaving many parous women without clear risk stratification. This study aimed to develop a prediction model for prediabetes or T2D risk in the general population of parous women, based on clinical pregnancy variables.
Methods: We assessed prediabetes/T2D five years after delivery in the Genetics of Glucose Regulation in Gestation and Growth (Gen3G) cohort (N=403). Using a machine-learning approach, we developed a risk prediction model from which we derived a simple, clinically usable risk index: the Gestational 4-variable Prediabetes/type 2 diabetes (G4PD) index. The G4PD index was then validated in the Project Viva cohort at three years (n=562) and seventeen years (n=541) after delivery.
Results: The G4PD index included gestational weight gain, pre-gestational body mass index, first-trimester maternal age, and a GDM variable reflecting hyperglycemia severity during pregnancy. In Gen3G, the model achieved a cross-validated estimate of the area under the receiver operating characteristic curve (ROC-AUC) of 0.696. The G4PD index achieved ROC-AUC of 0.682 in the 17-year Project Viva dataset, with similar results in the 3-year dataset. Beyond overall discrimination, the model effectively stratified women into clinically meaningful risk categories, with those in the lowest group (< 2) exhibiting an expected risk of ~2% and ~15% at three and seventeen years after delivery, respectively, whereas those with the highest scores (≥ 7 or ≥ 5) expected substantially higher risks (~7% and ~37% at respective time points).
Conclusion: The G4PD index, derived from clinical pregnancy variables, moderately predicts the risk of prediabetes/T2D over several years.

Keywords: pregnancy, prediabetes, type 2 diabetes, risk stratification, prediction model, machine-learning algorithm

Introduction

Gestational diabetes mellitus (GDM), defined as any degree of glucose intolerance first diagnosed during pregnancy, affects approximately 14% of pregnancies worldwide.¹ GDM is a major risk factor for prediabetes and type 2 diabetes (T2D), increasing the risk of developing T2D later in life up to 10-fold.² Numerous studies have developed prediction models for the development of prediabetes or T2D after GDM to better identify women at higher risk who could benefit the most from personalized preventive programs;³ however, a substantial proportion of future T2D cases among parous women occur in those without a prior GDM diagnosis.⁴

Independent of GDM, the pregnancy process places significant stresses on maternal metabolism to support optimal fetal growth.⁵ Furthermore, a continuous association has been reported between maternal glucose levels during pregnancy, below diabetic levels, and adverse pregnancy outcomes.⁶ Consistently, a recent US-based study also found that gestational hyperglycemia below the threshold used for GDM diagnosis was also associated with a higher risk of subsequent T2D.⁴ Notably, among parous women with T2D, only 32% had a history of GDM⁴ and some studies suggest that pregnancy itself may be a risk factor for future T2D.⁷ Therefore, prediction scores for prediabetes/T2D development should not be restricted to women with GDM only.

Despite known risk factors in the non-GDM parous population, we are aware of only two prediction models for postpartum T2D, both of which lack external validation.^8,9 In the current study, we aimed to develop a risk stratification model by applying a machine learning approach using clinical variables assessed during pregnancy to derive an index predicting prediabetes/T2D in a general population of parous women from Canada followed up to five years after pregnancy. We then externally validated our index in an independent prospective cohort from USA that followed women from pregnancy up to seventeen years post-delivery.

Methods

Study Population

Developmental and Internal Validation Cohort: Gen3G

The derivation cohort is the Genetics of Glucose regulation in Gestation and Growth (Gen3G) cohort. Gen3G is a prospective pre-birth cohort from Sherbrooke, Canada, which has been described in detail elsewhere.^10,11 Between 2010 and 2013, we recruited a total of 1024 women, during their first trimester of pregnancy, aged ≥ 18 years old, with a singleton pregnancy. We excluded women with pre-existing diabetes, diabetes diagnosed during the first trimester (based on the 2008 Canadian Diabetes Association criteria¹²), or who reported taking medications affecting glycemia. In accordance with the Declaration of Helsinki, all women provided written informed consent, prior to their inclusion in the study and at each post-delivery follow-up visit.

We followed women throughout their pregnancy and collected sociodemographic information, personal and familial medical history, as well as biological and anthropometric measurements at three time points: first trimester (median 9.6 weeks’ gestation), late second trimester (median 26.4 weeks’ gestation) and delivery (median 39.3 weeks’ gestation).¹⁰ We reassessed mothers at approximately three and five years after delivery, during which we updated their medical history and collected blood samples. Additionally, at the 5-year follow-up visit, a subsample of 333 women underwent a 75g oral glucose tolerance test (75g-OGTT),¹¹(Supplementary Figure 1). The institutional review board at the Centre intégré universitaire de santé et de services sociaux de l’Estrie – Centre hospitalier universitaire de Sherbrooke (CIUSSS de l’Estrie – CHUS) approved all Gen3G study protocols, Ethics approval number: 2010-198 – 07-027-A1.

External Validation Cohort: Project Viva

The external validation cohort is the Project Viva cohort. Project Viva is a prospective pre-birth cohort from eastern Massachusetts, USA, which has been described in detail previously.^13,14 We recruited a total of 2670 women in 1999–2002, who were ≤22^nd weeks’ gestation who presented a singleton pregnancy. For the present study, we excluded women with pre-existing type 1 diabetes (T1D) or T2D (self-reported by the mother at enrollment) (n = 16) (Supplementary Figure 1). All women provided written informed consent, prior to their inclusion in the study and at each post-delivery follow-up visit, in accordance with the Declaration of Helsinki.

We followed women throughout their pregnancy, during which research staff collected sociodemographic information, medical history, as well as biological and anthropometric measures at three different time points: first trimester (median 9.9 weeks’ gestation), late second trimester (median 27.9 weeks’ gestation), and delivery (median 39.7 weeks’ gestation).¹³ We reassessed at approximately three and seventeen years after delivery, during which we collected fasting blood samples from mothers who consented. At the 17-year follow-up visit, a subsample of 204 mothers also underwent a 75g-OGTT. The institutional review board at Harvard Pilgrim Health Care Institute approved Project Viva study protocols, HPHCI IRB - project number 235301.

Study Outcome

Gen3G

Based on the American Diabetes Association (ADA) criterion, women from Gen3G were classified as having prediabetes or T2D at the 5-year follow-up visit if they met at least one of the following thresholds: HbA1c ≥5.7%, fasting glucose ≥5.6 mmol/L, or 2h post-OGTT glucose levels ≥7.8 mmol/L.¹⁵ We defined women without prediabetes/T2D if they had at least one available biological glycemic value, with all available values falling within normal ranges (HbA1c <5.7%, fasting glucose <5.6 mmol/L, or 2h post-OGTT glucose levels <7.8 mmol/L), and were not taking medications affecting glycemia.

Project Viva

We assessed the prediabetes/T2D outcome at seventeen years after delivery using HbA1c, fasting glucose, and/or 2h post-OGTT glucose levels, using the ADA criteria as in Gen3G. We excluded one woman who developed T1D after pregnancy, identified through self-reported in the follow-up questionnaire. We considered the 17-year visit as our main external validation dataset because the available glycemic biomarkers were the same as in Gen3G.

We also assessed prediabetes/T2D at three years after delivery in Project Viva using available biomarker (HbA1c and fasting glucose). Although the 2h post-OGTT glucose was not available at the 3-years follow-up visit, this reflects real-world clinical practice where the OGTT is less frequently used (due to cost, discomfort, time requirements, etc). In addition, the ADA no longer recommends the use of OGTT for routine clinical diagnosis of diabetes in non-pregnant women,¹⁶ despite OGTT being more sensitive to diagnose prediabetes and T2D.¹⁷

Clinical Predictors

We selected candidate clinical variables predictors based on a review of the literature and classified them into five domains: sociodemographic factors, medical history, pre-gestational predictors, pregnancy-related variables, and adverse pregnancy outcomes. The following sections describe the assessment methods for the four variables retained in the final index across both cohorts. See Supplementary materials for comprehensive details on how all 15 candidate predictors were measured and categorized. Table 1 presents the complete list of predictors along with their prespecified functional forms used in our analyses, while Supplementary Table 1 details the definitions and timing of measurements for each predictor in both cohorts.

Table 1 Candidate Predictors for the Development of Prediabetes/T2D Assessed Five years After Delivery in Gen3G (Development and Internal Validation Cohort) and at Both Seventeen and Three years After Delivery in Project Viva (External Validation Cohort)

GDM Diagnosis

We created a 4-level ordinal GDM variable based on the severity of hyperglycemia or intensity of required treatment. As different approaches were used to diagnose GDM in the two cohorts, we defined ordinal levels that were not identical but reflecting increasing severity of hyperglycemia in each cohort.

In Gen3G, GDM diagnosis was defined using the International Association of the Diabetes and Pregnancy Study Groups (IADPSG) criteria, using a 75g 2-hour OGTT conducted at ~28 weeks of pregnancy in women not previously identified with hyperglycemia. Blood samples were collected at fasting, 1h, and 2h, with diagnosis based on the following thresholds: fasting blood glucose ≥5.1 mmol/L, 1h glucose ≥10.0 mmol/L, or 2h glucose ≥8.5 mmol/L.¹⁸ We created a 4-level GDM variable based on insulin requirements during pregnancy and the timing of diagnosis, as follows: 0= no GDM, 1= GDM without insulin requirement, 2= GDM treated with insulin, 3= early GDM (diagnosed clinically before ~28 weeks in the presence of risk factors).

In Project Viva, GDM was defined using the Carpenter and Coustan criteria at ~28 weeks of pregnancy, following clinical practice. All women first underwent a non-fasting 50g-glucose challenge test (GCT) with blood draw at 1h after the ingestion. If 1h-glucose levels ≥7.8 mmol/L, women were referred to a fasting 100g-OGTT, with blood samples collected at fasting, 1h, 2h, and 3h after the test. Abnormal 100g-OGTT blood glucose values were >5.3 mmol/L at fasting, >10 mmol/L at 1h, >8.6 mmol/L at 2h, or >7.8 mmol/L at 3h. Women with normal GCT results were categorized as having normal glycemia, those with an abnormal GCT but a normal OGTT were categorized as having isolated hyperglycemia; those with only one abnormal value at the OGTT test were categorized as having intermediate glucose intolerance; those with at least two abnormal values at the OGTT test were diagnosed with GDM.¹⁹ As Project Viva did not assess the timing of GDM diagnosis and insulin requirement, we created a 4-level GDM variable reflecting glycaemic severity in line with prior Project Viva reports²⁰ as follows: 0= normal glycaemia, 1= isolated hyperglycemia, 2= intermediate glucose intolerance and 3= GDM.

Maternal Age

Participating women reported their age during the index pregnancy in both cohorts.

Pre-Gestational Body Mass Index (BMI)

We used maternal height and pre-gestational weight to calculate pre-gestational BMI using the standard formula (ie, kg/m²). Participants self-reported their pre-gestational weight (kg) in the first trimester of pregnancy, and research staff measured height (m) objectively using standardized protocol in the first trimester of pregnancy. Self-reported pre-gestational weight is considered a reliable estimate of the pre-gestational weight.²¹ In Project Viva, participant self-reported height in the first trimester visit.

Gestational Weight Gain (GWG)

GWG was calculated as the difference between the last recorded weight (in the electronic medical records) during pregnancy prior to delivery (Gen3G: median 36.6 weeks, 17-year Project Viva dataset: median 39.3 weeks) and the pre-gestational weight (self-reported by the mother during the first trimester visit).

Statistical Analyses

Descriptive Statistics

We reported the median and interquartile range (IQR) and used Wilcoxon-Mann–Whitney test to compare continuous variables; we used Chi-square test or Fisher’s exact test, as appropriate, for categorical variables. We considered differences statistically significant using p-values <0.05.

Model Development and Internal Validation (Gen3G)

Figure 1 outlines the steps of the study analysis. We used the Gen3G cohort to train our final prediction models (described below). For internal validation, we divided the dataset into five mutually exclusive blocks, using stratified sampling based on the number of participants with prediabetes/T2D (outcome) to ensure an equal distribution across each block (Figure 1, Step 1). These blocks were used for five-fold cross-validation to estimate the performance of the models.²² This process involved fitting each model using four out of five blocks (“derivation set”) and evaluating its performance on the remaining block (“validation set”) (Figure 1, Step 2). This procedure is repeated five times, with a different block used as the validation set in each iteration, where the performance metrics are averaged across all five folds to estimate the cross-validated performance of the prediction models (Figure 1, Step 3). Finally, we fit complete final models using the entire Gen3G dataset (Figure 1, Step 4).

Figure 1 Study design.

Abbreviations: Gen3G, Genetics of Glucose regulation in Gestation and Growth cohort; ROC-AUC, area under the receiver operating characteristic curve; PR-AUC, area under the precision-recall curve; T2D, type 2 diabetes.

The Full and Reduced Models

To maximize internal performance, we fit random forest (RF) models using all candidate predictors (called the full models). RF is a flexible, non-parametric algorithm that captures interactions and nonlinearities, often outperforming other methods,²³ but its complexity and large predictor sets limit clinical applicability. Therefore, we also built more parsimonious models with a smaller set of predictors (called the reduced models).

To determine the number of clinical variables for the reduced model, we used the optimal number of candidate predictor parameters (CPPs), based on event count in the dataset. The Gen3G dataset included 406 participants with at least one glycemic biomarker (HbA1c, fasting glucose or 2h post-OGTT glucose) at five years follow-up; after excluding three with missing predictors, 403 remained, of whom 56 developed prediabetes/T2D (Supplementary Figure 1). Using Riley et al’s criteria²⁴ and the R package pmsampsize,²⁵ we calculated that 4 was the optimal number of CPPs to initially include in our model to reduce overfitting.²⁴ Predictors were ranked by variable importance from the full RF model, measured as mean decrease in Gini impurity, averaged over all trees of the RF.²⁶ Using the top 4 predictors, we then trained a least absolute shrinkage and selection operator (LASSO) regression model, which performed automated variable selection by applying shrinkage to the model coefficients.²⁷

Hyperparameter Tuning

For all models, we used a grid search procedure to fine-tune the value of their hyperparameters. This procedure involved iteratively evaluating the cross-validated performance of the algorithms over a set of plausible values for the hyperparameter in the derivation set.²⁸ For the RF model, we tuned the number of trees (ntree), the number of features considered at each split (mtry), and the maximum tree depth (nodesize). For the LASSO model, we optimized the regularization strength (lambda). We used five-fold cross-validation in the derivation set during grid search. Full details and tuned hyperparameter values used in the final models are provided in Supplementary Table 2.

Risk Scoring System

Using the methods described by Sullivan et al,²⁹ we converted the final LASSO logistic regression model into a risk score system for risk stratification, referred to as the Gestational 4-variable Prediabetes/Diabetes risk (G4PD) index. The number of points assigned to each category of an ordinal variable was calculated by dividing the difference in regression units between that category and the reference category (always assigned 0 points) by a constant (the increase in risk associated with a five-year increase in age from the model) and rounding the result to the nearest integer. For continuous variables, we first categorized them using clinically relevant or quantile cut points. Points for each resulting category were then assigned based on the difference in regression units between its midpoint and the midpoint of the reference category, again divided by the constant and rounded to the nearest integer.²⁹

External Validation (Project Viva)

We assessed the ability of the G4PD index to predict the risk of prediabetes/T2D in Project Viva. We first calculated the G4PD index for each participant. We then fit a logistic regression model with prediabetes/T2D as the outcome and the G4PD index as the independent predictor, separately for the 17- and 3-years follow-up Project Viva datasets. We further assessed calibration of the G4PD index by comparing the expected and observed number of prediabetes/T2D cases after delivery within 4 strata, created by dividing the cohort into quartiles and rounding the cut-off points to the nearest integer of the risk score distribution in each dataset. The expected risk of prediabetes/T2D for each woman was calculated as the inverse of 1 + e^{– (intercept + β*risk score)}, where β represents the coefficient of the G4PD index in the regression model. The total number of expected cases in each strata was calculated by summing the expected risk across all individuals within that strata.^30,31 We considered the observed and expected proportions to be similar if the expected proportion fell within the exact 95% confidence interval (CI) around the observed value.^30–32

We also assessed the ability of the final models to predict the risk of prediabetes/T2D in Project Viva. We used the final models to calculate the probability of developing prediabetes/T2D for all participants in the two Project Viva datasets (Figure 1, Step 5). In both cohorts, we assessed overall performance using three metrics: the scaled Brier score, the area under the receiver operating characteristic curve (ROC-AUC), and the area under the precision-recall curve (PR-AUC) (see Supplementary materials for further details). We completed analyses on datasets with complete data for predictors and outcomes in both Gen3G and Project Viva (no missing values). We performed all statistical analyses in R version 4.4.3 in RStudio-server version 2024.12.1+563.

Results

Participant Characteristics

The characteristics of study participants from the Gen3G cohort and Project Viva cohort (17-year dataset) are summarized in Table 2. In brief, the two cohorts were similar with respect to smoking status at first trimester, family history of T2D, pre-gestational BMI, parity, chronic or gestational hypertension, preeclampsia, GDM, and prematurity. However, participants in Gen3G more frequently self-identified as White (96% versus 67%), were younger (median 26.3 versus 32.7 years old), and less frequently having completed university compared to Project Viva participants (Table 2). Median GWG was lower in Gen3G (13.3 kg) compared to Project Viva (15.4 kg). The frequency of macrosomia was also lower in Gen3G (12%) compared to Project Viva (18%). The proportion of participants with prediabetes or T2D reached 14% at the 5-year follow-up in Gen3G and 23% in Project Viva at the 17-year follow-up (4% at the 3-year follow-up). Characteristics stratified by prediabetes/T2D development status in each dataset are presented in Supplementary Table 3.

Table 2 Characteristics of the Study Participants From the Gen3G (Development and Internal Validation) and Project Viva (External Validation) Cohorts

Performance of the Full Models Estimated in Gen3G Using Cross-Validation

To maximize internal performance, we first fit RF models with all 15 predictors. In Gen3G, the full RF model had a cross-validated estimate of the scaled Brier score of 0.158±0.090, indicating a 15.8% reduction in mean squared error compared to a non-informative model. This model also showed good discrimination with ROC-AUC 0.760±0.097 and PR-AUC 0.461±0.101, both estimates showing improvement compared to the baseline model (Supplementary Table 4).

Performance of the Reduced Models Estimated in Gen3G Using Cross-Validation

To enhance the clinical applicability, we also developed a linear model using a smaller number of predictors. During the cross-validation procedure, the full RF model consistently identified the same four variables as the most important predictors across all folds. In decreasing order of importance, averaged across all folds, these variables were: GWG, pre-gestational BMI, maternal age at first trimester, and the 4-level GDM variable. There was a consistent separation in the mean decrease Gini impurity values between these top four predictors and the remaining eleven predictors in every fold (Supplementary Figure 2).

We used these four predictors in a LASSO regression to build the reduced model. The cross-validated estimates of the scaled Brier score was 0.151±0.105, ROC-AUC was 0.696±0.106, and PR-AUC was 0.462±0.122 (Supplementary Table 4). The regression coefficients of the predictors retained in the final reduced model that were fit using the entire Gen3G dataset are presented in Supplementary Table 5. We converted the final reduced model into a risk stratification score system for predicting prediabetes/T2D after delivery, namely the G4PD index. The G4PD index ranges from −2 to 27, with GDM severity contributing up to 19 points, and is presented in Table 3.

Table 3 The Four Clinical Variables Assessed During Pregnancy and Used to Calculate the G4PD Index for Prediabetes/T2D in Parous Women

Performance of All Models in Project Viva

We assessed the performance of the G4PD index in Project Viva. Using the 17-year dataset, the scaled Brier score was 0.084, ROC-AUC was 0.682, and PR-AUC was 0.442. Comparing the expected and observed event rates to assess calibration for the G4PD index, the expected event rate was always within the 95% CI of the observed event rate within all risk strata. Women with the lowest score (<2) have an expected risk of ~2% and ~15% at three and seventeen years after delivery, respectively, whereas those with the highest scores (≥7 or ≥5) had an expected risk of ~7% and ~37% at respective time points (Table 4).

Table 4 Expected and Observed Probability of Prediabetes/T2D at Three years and Seventeen years After Delivery in Project Viva (External Validation), by Risk Score Strata of the Risk Score System

We evaluated the performance of the final models in the 17-year Project Viva dataset. Using the full RF model, the scaled Brier score was 0.055, ROC-AUC was 0.689, and PR-AUC was 0.448. Using the reduced model (4 variables), the corresponding metrics were 0.036, 0.676, and 0.433 (Supplementary Table 4). The performance of the G4PD index as well as the final models for predicting prediabetes/T2D three years after delivery in Project Viva are presented in Supplementary Table 4.

Discussion

In this study, we developed a prediabetes/T2D risk prediction model for parous women using a machine learning approach and based on clinical variables commonly measured during pregnancy. The four main predictors retained were GWG, pre-gestational BMI, first-trimester maternal age, and a GDM variable reflecting the severity of gestational hyperglycemia, the latter having the most weight in the risk stratification model. In the Gen3G cohort, the final model achieved cross-validated estimate of the ROC-AUC of 0.696. The cross-validated estimate of the PR-AUC was 0.462, which indicates that our model performs 3.3 times better than a random classifier, for which the PR-AUC would correspond to the proportion of positive cases in the test set (Gen3G = 0.139). When evaluated in an independent cohort, the G4PD index showed good calibration across all its range.

Impact of Cohort Differences on Model Performance

The different sociodemographic profiles and follow-up durations in Gen3G and Project Viva provide a stringent test of model generalizability, allowing assessment of performance across heterogeneous populations. The final model achieved better calibration in the Gen3G cohort (cross-validated estimate of scaled Brier score: 0.151) compared to the index in both the 3- and 17-year project Viva datasets (scaled Brier score: 0.017 and 0.084, respectively). The decrease of performance observed in Project Viva compared with Gen3G may be partly explained by differences in both cohorts in terms of sociodemographic variables such as ethnicity, age and educational level. These variables are known to influence cardiometabolic trajectories, access to follow-up care, and the likelihood of prediabetes/T2D development after pregnancy.³³ Consequently, the model’s generalisability may be reduced in more diverse of lower resources populations. This highlights the importance of external validation and potential recalibration before the model is implemented more widely in clinical practice.

Comparison with Previous Studies of T2D Predicting Algorithms in Parous Women

To our knowledge, only two studies have developed post-delivery T2D prediction models in the general population of parous women,^8,9 both limited to Asian cohorts without external validation. Kumar et al used CatBoost to predict T2D 4–8 years after delivery (ROC-AUC 0.86), with GDM and mid-gestation BMI as key predictors;⁸ stronger performance may reflect their higher case proportion (30% vs 14% in Gen3G), which reduces class imbalance and allows the model to better learn the minority class.³⁴ Lee et al used logistic regression within five years (ROC-AUC 0.74), identifying GDM, pre-gestational BMI, hyperglycemia (one abnormal single value at the 100g OGTT test), nulliparity, hypertension, hyperlipidemia, and family history of T2D.⁹ Their stronger performance likely reflects a larger sample (n = 9353 vs 403), which enable inclusion of a greater numbers of candidate predictors.

Study Findings of Individual Predictors in Context of Prior Literature

Our finding that the 4-level GDM variable was associated with an increased risk of post-delivery prediabetes/T2D is consistent with current literature.² This association likely reflects a combination of genetic predisposition, revealed during pregnancy, and ongoing environmental factors. Indeed, a recent multi-ancestry genome-wide association study has confirmed a similar genetic architecture between GDM and T2D.³⁵ GDM is therefore recognized as a marker for identifying women genetically predisposed to T2D, with pregnancy acting as window for risk detection or potentially as an environmental trigger for disease progression.³⁶ Beyond genetic susceptibility, excess adiposity, high-fat and high-calorie diets, and physical inactivity are major risk factors for both GDM and T2D,³⁷ supporting the important contribution of environmental factors to the observed association between GDM and T2D.

Our finding of a positive association between post-delivery prediabetes/T2D and higher maternal age is expected. It is likely explained by reduced beta-cell function from loss of mass, lower proliferation, and diminished incretin response related to aging processes.³⁸ Age also reduces insulin sensitivity through mechanisms such as visceral fat accumulation, sarcopenia, mitochondrial dysfunction, chronic inflammation, and oxidative stress.³⁹ These changes in pregnancy may promote GDM⁴⁰ and contribute to later maternal risk of prediabetes/T2D.²

Our finding that pre-gestational BMI was positively associated with post-delivery prediabetes/T2D could be explained in several ways. Due to elevated adiposity, higher pre-gestational BMI may be accompanied by insulin resistance,⁴¹ that is exacerbated by the physiological adaptation to pregnancy which facilitates glucose transfer to the growing fetus;⁵ however, when combined may surpass the normal beta-cell function capacities, or even precipitate beta-cell dysfunction, consequently increasing future maternal risk of prediabetes/T2D.⁴² Additionally, higher pre-gestational BMI may also indicate environmental factors (diet, sedentary lifestyle, pollutants) that are present before, during, and after pregnancy underlying common risks towards prediabetes/T2D in later life.⁴³

Our finding that participants with lower GWG were more likely to develop prediabetes/T2D after delivery is consistent with some, but not all, previous studies. While several studies have similarly reported an inverse association between GWG and subsequent maternal risk of T2D,^8,9 Kamihara et al⁴⁴ observed a positive association, whereas Coelho et al⁴⁵ found no significant association. The inverse association we observed between GWG and increased maternal risk of T2D may be partially explained by the fact that participants with lower GWG tend to have a higher pre-gestational BMI.⁴⁶ In line with this, Kamihara et al⁴⁴ excluded women with pre-pregnancy BMI>25 kg/m², which may explain the discrepancy between their findings and ours. Within the G4PD index, GWG modestly complements the main predictors of future maternal T2D risk.

Clinical Implications of Study Findings

This study identified four important clinical predictors for risk stratification of prediabetes/T2D in parous women. The G4PD index demonstrated good calibration and effectively stratified women into clinically meaningful risk groups. Women with the lowest scores had observed risks of ~2% at three years and ~15% of prediabetes/T2D at seventeen years, while those with the highest scores faced risks of ~7% and ~37% at each timepoint respectively. Thus, the G4PD index could provide a valuable approach for identifying women at persistently elevated long-term risk. However, the level of performance of our index needs to be improved to be adopted clinically.

Strengths and Limitations

A strength of this study is the inclusion of easily obtainable variables in the G4PD index, all of which can be self-reported by women, making it a practical tool for population-level screening. Another strength is the external validation of the index over both the short (three years) and long term (seventeen years). A limitation of this study is the different assessment of glycemic outcomes across cohorts due to variations in follow-up time and glycemic testing (ie, HbA1c, fasting glucose, 2-hour post-OGTT glucose). For instance, the absence of an OGTT at three years post-delivery in Project Viva may have led to an underestimation of the true incidence of prediabetes/T2D, which may partly explain the reduced model performance observed upon replication in this dataset. However, this allowed us to test the robustness of the models when applied to diverse data availability, reflecting real-world practice. Finally, the relatively small number of women with prediabetes/T2D in both cohorts limited the number of predictors that could be included in the models.

Conclusion

We derived and externally validated the G4PD index, intended for risk stratification of prediabetes/T2D in parous women, using four clinical variables measured during pregnancy: GWG, pre-gestational BMI, maternal age, and a 4-level GDM variable reflecting hyperglycemia severity. Our risk score was robust across different prediction timelines. However, additional variables are likely needed to improve the prediction performance before this type of approach can be clinically valuable and adopted.

Abbreviations

75g-OGTT, 75g Oral Glucose Tolerance Test; ADA, American Diabetes Association; BMI, Body Mass Index; CIUSSS de l’Estrie – CHUS, Centre intégré universitaire de santé et de services sociaux de l’Estrie – Centre hospitalier universitaire de Sherbrooke; CPP, Candidate Predictor Parameter; DBP, Diastolic Blood Pressure; G4PD, Gestational 4-variable Prediabetes/Diabetes risk mo GCT, Glucose Challenge Test; GDM, Gestational Diabetes Mellitus; Gen3G, Genetics of Glucose regulation in Gestation and Growth cohort; GWG, Gestational Weight Gain; LASSO, Least Absolute Shrinkage and Selection Operator; PCOS, Polycystic Ovarian Syndrome; PR-AUC, Area Under the Precision-Recall Curve; ROC-AUC, Area Under the Receiver Operating Characteristic Curve; SBP, Systolic Blood Pressure; T1D, Type 1 Diabetes; T2D, Type 2 Diabetes.

Data Sharing Statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

We thank the participants of the Gen3G cohort for their commitment and support over the last 12 years. We would also like to thank the Project viva participants and study staff.

Author Contributions

A.T.: Conceptualization, Formal analysis, Investigation, Writing – original draft. J.W.: Conceptualization, Formal analysis, Investigation, Writing – review and editing. S.H.L.: Investigation, Writing – review and editing. M.A.B.: Investigation, Writing – review and editing. M.D.: Data curation, Writing – review and editing. M.A.: Data curation, Writing – review and editing. S.R.S.: Data curation, Writing – review and editing. E.O.: Conceptualization, Writing – review and editing. P.P.: Conceptualization, Writing – review and editing. P.-É.J.: Investigation, Supervision, Writing – review and editing. L.B.: Conceptualization, Investigation, Funding acquisition, Supervision, Writing – review and editing. M.-F.H.: Conceptualization, Investigation, Funding acquisition, Supervision, Writing – review and editing. All authors have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

Gen3G was supported over the years by the Fonds de recherche du Québec—Santé operating grant [to M-FH, grant numbers 20697]; a Canadian Institute of Health Research (CIHR) operating grant [to M-FH grant numbers MOP 115071 and to LB numbers PJT152989 and PJT190076], Diabète Québec, Internal funding supports from le Centre de recherche du CHUS and l’Université de Sherbrooke, and from American Diabetes Association [to MFH grant numbers 1-15-ACE-26]. AT was recipient of a doctoral research award from the Canadian Institutes of Health Research (CIHR). S.H.L. is supported by the Thomas O. Pyle Fellowship (Harvard Medical School) and a postdoctoral award from the American Diabetes Association [7-23-PDFT2DY-03]. L.B. and P.-É.J. are senior research scholars of the FRQS, and M.A.B. is a Junior 2 research scholar of the FRQS. None of the above sources of funding participated in the design of the study.

Disclosure

The authors have no competing interest to declare.

References

1. Wang H, Li N, Chivese T, et al. IDF diabetes atlas: estimation of global and regional gestational diabetes mellitus prevalence for 2021 by international association of diabetes in pregnancy study group’s criteria. Diabet Res Clin Pract. 2022;183:109050. doi:10.1016/j.diabres.2021.109050

2. Vounzoulaki E, Khunti K, Abner SC, Tan BK, Davies MJ, Gillies CL. Progression to type 2 diabetes in women with a known history of gestational diabetes: systematic review and meta-analysis. BMJ. 2020;369:m1361. doi:10.1136/bmj.m1361

3. Magboul NME, Dkeen NOM, Mohammed HAH, et al. Machine learning for predicting the transition from gestational diabetes to type 2 diabetes: a systematic review. Cureus. 2025;17(5). doi:10.7759/cureus.84314

4. Selen DJ, Thaweethai T, Schulte CCM, et al. Gestational glucose intolerance and risk of future diabetes. Diabetes Care. 2023;46(1):83–13. doi:10.2337/dc22-1390

5. Lain KY, Catalano PM. Metabolic changes in pregnancy. Clin Obstet Gynecol. 2007;50(4):938. doi:10.1097/GRF.0b013e31815a5494

6. Metzger BE, Lowe LP; HAPO Study Cooperative Research Group. Hyperglycemia and adverse pregnancy outcomes. N Engl J Med. 2008;358(19):1991–2002. doi:10.1056/NEJMoa0707943

7. Guo P, Zhou Q, Ren L, Chen Y, Hui Y. Higher parity is associated with increased risk of Type 2 diabetes mellitus in women: a linear dose-response meta-analysis of cohort studies. J Diabetes Complications. 2017;31(1):58–66. doi:10.1016/j.jdiacomp.2016.10.005

8. Kumar M, Ang LT, Ho C, et al. Machine learning–derived prenatal predictive risk model to guide intervention and prevent the progression of gestational diabetes mellitus to type 2 diabetes: prediction model development study. JMIR Diab. 2022;7(3):e32366. doi:10.2196/32366

9. Lee SU, Hong S, Choi SK, et al. Glucose tolerance test with a single abnormal value as a predictor of type 2 diabetes mellitus: a multicenter retrospective study. Sci Rep. 2024;14(1):6792. doi:10.1038/s41598-024-57535-8

10. Guillemette L, Allard C, Lacroix M, et al. Genetics of glucose regulation in gestation and growth (Gen3G): a prospective prebirth cohort of mother–child pairs in Sherbrooke, Canada. BMJ Open. 2016;6(2):e010031. doi:10.1136/bmjopen-2015-010031

11. Taschereau A, Doyon M, Arguin M, et al. Cohort profile: the genetics of glucose regulation in gestation and growth (Gen3G) - a prospective prebirth cohort of mother-child pairs in Sherbrooke, Canada, 3-year and 5-year follow-up visits. BMJ Open. 2025;15(3):e093434. doi:10.1136/bmjopen-2024-093434

12. Russo V, Martelli A, Mauro A; Canadian Diabetes A. Canadian diabetes association 2008 clinical practice guidelines for the prevention and management of diabetes in Canada. Veterinary Research Communications. 2008;32(Supp 1):S171–2. doi:10.1007/s11259-008-9110-6

13. Oken E, Baccarelli AA, Gold DR, et al. Cohort profile: project viva. Int J Epidemiol. 2015;44(1):37–48. doi:10.1093/ije/dyu008

14. Rifas-Shiman SL, Aris IM, Switkowski KM, et al. Cohort profile update: project viva mothers. Int J Epidemiol. 2023;52(6):e332–e339. doi:10.1093/ije/dyad137

15. ElSayed NA, Aleppo G, Bannuru RR; American Diabetes Association Professional Practice Committee. 2. Diagnosis and classification of diabetes: standards of care in diabetes—2024. Diabetes Care. 2023;47(Supplement_1):S20–S42. doi:10.2337/dc24-S002

16. Expert Committee on the Diagnosis and Classification of Diabetes Mellitus. Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care. 2003;26 Suppl 1(suppl_1):S5–20. doi:10.2337/diacare.26.2007.s5

17. Engelgau MM, Thompson TJ, Herman WH, et al. Comparison of fasting and 2-hour glucose and HbA1c levels for diagnosing diabetes: diagnostic criteria and performance revisited. Diabetes Care. 1997;20(5):785–791. doi:10.2337/diacare.20.5.785

18. Metzger BE, Gabbe SG, Persson B, et al; International Association of Diabetes and Pregnancy Study Groups Consensus Panel. International association of diabetes and pregnancy study groups recommendations on the diagnosis and classification of hyperglycemia in pregnancy. Diabetes Care. 2010;33(3):676–682. doi:10.2337/dc09-1848

19. Carpenter MW, Coustan DR. Criteria for screening tests for gestational diabetes. Am J Obstet Gynecol. 1982;144(7):768–773. doi:10.1016/0002-9378(82)90349-0

20. Gingras V, Rifas-Shiman SL, Derks IPM, Aris IM, Oken E, Hivert MF. Associations of gestational glucose tolerance with offspring body composition and estimated insulin resistance in early adolescence. Diabetes Care. 2018;41(12):e164–e166. doi:10.2337/dc18-1490

21. Sharma AJ, Bulkley JE, Stoneburner AB, et al. Bias in self-reported prepregnancy weight across maternal and clinical characteristics. Matern Child Health J. 2021;25(8):1242–1253. doi:10.1007/s10995-021-03149-9

22. Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. Springer; 2009; doi:10.1007/978-0-387-84858-7

23. Uddin S, Khan A, Hossain ME, Moni MA. Comparing different supervised machine learning algorithms for disease prediction. BMC Med Inform Decis Mak. 2019;19(1):281. doi:10.1186/s12911-019-1004-8

24. Riley RD, Ensor J, Snell KIE, et al. Calculating the sample size required for developing a clinical prediction model. BMJ. 2020;368:1–12.

25. Ensor J. pmsampsize: sample size for development of a prediction model. 2023. Available from: https://cran.r-project.org/web/packages/pmsampsize/index.html. Accessed March 25, 2025.

26. Liaw A, Wiener M, Denkins YM. Classification and regression by randomforest. J Experim Therapeutics Oncol. 2002;2(5):286–297. doi:10.1046/j.1359-4117.2002.01053.x

27. Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol. 1996;58(1):267–288. doi:10.1111/j.2517-6161.1996.tb02080.x

28. Luo G. A review of automatic selection methods for machine learning algorithms and hyper-parameter values. Netw Model Anal Health Inform Bioinforma. 2016;5(1):18. doi:10.1007/s13721-016-0125-6

29. Sullivan LM, Massaro JM, D’Agostino RB. Presentation of multivariate data for clinical use: the Framingham study risk score functions. Stat Med. 2004;23(10):1631–1660. doi:10.1002/sim.1742

30. van Walraven C, Wong J, Forster AJ. Derivation and validation of a diagnostic score based on case-mix groups to predict 30-day death or urgent readmission. Open Med. 2012;6(3):e90–e100. doi:10.1503/cmaj.091117

31. van Walraven C, Wong J, Forster AJ. LACE+ index: extension of a validated index to predict early death or urgent readmission after hospital discharge using administrative data. Open Med. 2012;6(3):e80–e90. doi:10.1258/jrsm.99.8.406

32. Daly L. Simple SAS macros for the calculation of exact binomial and poisson confidence limits. Comput Biol Med. 1992;22(5):351–361. doi:10.1016/0010-4825(92)90023-G

33. Janevic T, McCarthy K, Liu SH, et al. Racial and ethnic inequities in development of type 2 diabetes after gestational diabetes mellitus. Obstet Gynecol. 2023;142(4):901–910. doi:10.1097/AOG.0000000000005324

34. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. 2019;6(1):27. doi:10.1186/s40537-019-0192-5

35. Brito Nunes C, Rukins V, Cisse AH, et al. Multi-ancestry, trans-generational GWAS meta-analysis of gestational diabetes and glycaemic traits during pregnancy reveals limited evidence of pregnancy-specific genetic effects. 2025. doi:10.1101/2025.08.19.25333735

36. Permutt MA, Wasson J, Cox N. Genetic epidemiology of diabetes. J Clin Invest. 2005;115(6):1431–1439. doi:10.1172/JCI24758

37. Bengtson AM, Ramos SZ, Savitz DA, Werner EF. Risk factors for progression from gestational diabetes to postpartum type 2 diabetes: a review. Clin Obstet Gynecol. 2021;64(1):234–243. doi:10.1097/GRF.0000000000000585

38. Chang AM, Halter JB. Aging and insulin secretion. Am J Physiol Endocrinol Metab. 2003;284(1):E7–12. doi:10.1152/ajpendo.00366.2002

39. Shou J, Chen PJ, Xiao WH. Mechanism of increased risk of insulin resistance in aging skeletal muscle. Diabetol Metab Syndr. 2020;12(1):14. doi:10.1186/s13098-020-0523-x

40. Li Y, Ren X, He L, Li J, Zhang S, Chen W. Maternal age and the risk of gestational diabetes mellitus: a systematic review and meta-analysis of over 120 million participants. Diabet Res Clin Pract. 2020;162:108044. doi:10.1016/j.diabres.2020.108044

41. Kahn BB, Flier JS. Obesity and insulin resistance. J Clin Invest. 2000;106(4):473–481. doi:10.1172/JCI10842

42. Metzger BE, Cho NH, Roston SM, Radvany R. Prepregnancy weight and antepartum insulin secretion predict glucose tolerance five years after gestational diabetes mellitus. Diabetes Care. 1993;16(12):1598–1605. doi:10.2337/diacare.16.12.1598

43. Huvinen E, Engberg E, Meinilä J, et al. Lifestyle and glycemic health 5 years postpartum in obese and non-obese high diabetes risk women. Acta Diabetol. 2020;57(12):1453–1462. doi:10.1007/s00592-020-01553-1

44. Kamihara Y, Ogawa K, Morisaki N, Arata N, Wada S. Association between gestational weight gain and chronic disease risks in later life. Sci Rep. 2024;14(1):659. doi:10.1038/s41598-023-50844-4

45. Coelho S, Canha M, Leite AR, et al. Relation between weight gain during pregnancy and postpartum reclassification in gestational diabetes. Endocrine. 2023;82(2):296–302. doi:10.1007/s12020-023-03441-4

46. Institute of Medicine (US) Committee on Nutritional Status During Pregnancy and Lactation. Nutrition During Pregnancy: Part I Weight Gain: Part II Nutrient Supplements. National Academies Press (US); 1990. Available from http://www.ncbi.nlm.nih.gov/books/NBK235228/. Accessed June 20, 2025.

Creative Commons License © 2026 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.