Back to Journals » Patient Related Outcome Measures » Volume 16

Psychometric Evaluation of the Hypoparathyroidism Symptom Diary: Data from a Prospective Phase 3b/4 Study

Authors Wang S, Rockwood NJ, Yarr S, Korver D, Castriota F, Martin S, Ayodele O

Received 13 May 2025

Accepted for publication 17 January 2026

Published 10 March 2026 Volume 2025:16 Pages 285—308

DOI https://doi.org/10.2147/PROM.S539994

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Mithi Ahmed-Richards



Suwei Wang,1 Nicholas J Rockwood,2 Stuart Yarr,2 Dane Korver,2 Felicia Castriota,1 Susan Martin,2 Olulade Ayodele1

1Takeda Development Center Americas Inc., Lexington, MA, USA; 2RTI Health Solutions, Research Triangle Park, Durham, NC, USA

Correspondence: Suwei Wang, Takeda Development Center Americas Inc., 95 Hayden Avenue, Lexington, MA, 02421, USA, Tel +1 224 554-6500, Email [email protected]

Purpose: To assess the psychometric properties of the disease-specific Hypoparathyroidism Symptom Diary (HypoPT-SD) patient-reported outcome (PRO) tool, which consists of a 7-item symptom subscale, a 4-item impact subscale, a single item for anxiety, and a single item for sadness or depression, using data from the BALANCE randomized, placebo-controlled Phase 3b/4 study (NCT03324880).
Methods: Eligible patients had symptomatic hypoparathyroidism (HypoPT) at baseline and were aged 18– 85 years (inclusive). Patients received recombinant human parathyroid hormone (1– 84) or placebo. The HypoPT-SD was filled in daily; data recorded at baseline and Weeks 4, 12, and 26 (end of treatment [EOT]) were included in this analysis. Inter-item and item–total correlations were used to assess HypoPT-SD structure; Cronbach’s coefficient α was used to analyze the internal consistency and reliability, and intraclass correlations were used to measure test–retest reliability. Construct validity was determined using correlational analyses between HypoPT-SD scores and scores from other conceptually similar PRO tools. Ability to detect change was assessed and thresholds for meaningful within-patient change were established.
Results: The psychometric analysis population (N=93) was predominantly female (88.2%) and white (96.8%), with a mean age of 48.5 years. Inter-item correlations ranged from 0.35 to 0.85 at baseline and from 0.49 to 0.93 at EOT. Item–total correlations ranged from 0.57 to 0.83 at baseline and from 0.69 to 0.88 at EOT. Cronbach’s α values at baseline were 0.90 (symptom subscale) and 0.88 (impact subscale). Intraclass correlation coefficients for both subscales in stable patients exceeded 0.70. Significant cross-sectional correlations were observed with most of the conceptually linked PRO tools analyzed, and HypoPT-SD scores were responsive to change. Potential changes of 1.5 (symptom subscale) and 0.8 (impact subscale) were determined as meaningful change thresholds for within-patient improvements.
Conclusion: The HypoPT-SD is a reliable measure of key symptoms and impacts of HypoPT.

Keywords: hypoparathyroidism, patient-reported outcomes, psychometric, quality of life

Introduction

Hypoparathyroidism (HypoPT) is a rare condition – most frequently occurring as a post-surgical sequelae following thyroid procedures1 – characterized by absent or reduced levels of parathyroid hormone (PTH), leading to hypocalcemia and, often, hyperphosphatemia.2,3 In addition to the physical manifestations of the disease such as muscle cramps and spasms, paresthesia, numbness, and fatigue, HypoPT is also associated with depression, an inability to concentrate, memory loss, and anxiety.2,4,5

Studies have demonstrated that patients with HypoPT – the vast majority of whom were receiving conventional therapy with calcium or vitamin D – had lower health-related quality of life (HRQoL) than matched control populations without HypoPT or than the general population.6–9

Many studies investigating symptom burden and HRQoL in patients with HypoPT have used validated but not disease-specific patient-reported outcome (PRO) tools.6–9 The unmet need for a validated, HypoPT-specific PRO tool has resulted in the development of disease-specific instruments.10–12 Previously, we have reported the development and psychometric evaluation of the HypoPT Symptom Diary (HypoPT-SD) – a disease-specific PRO instrument designed to measure the symptom burden and impact on daily life of HypoPT.13,14 The HypoPT-SD was initially evaluated using data from a cross-sectional, observational study of 52 patients with HypoPT receiving conventional therapy (vitamin D and calcium supplements).13

A subsequent analysis used data from two Phase 4 studies (52-week study: NCT03364738; 36-month study: NCT02910466) in which patients received recombinant human parathyroid hormone (1–84) (rhPTH[1–84]).15 In one of these Phase 4 studies (NCT02910466), HypoPT-SD data (with a recall period of the past 24 hours) were collected from 38 patients during a 36-month treatment period. The other Phase 4 study (NCT03364738) collected HypoPT-SD data (with a recall period of the past 7 days) from 22 patients during a 52-week treatment period. In both studies, the HypoPT-SD was administered at selected time points in a patient population with stable disease.15 This restricted the opportunity to investigate how HypoPT-SD scores performed across a range of disease symptom severities and impacts, and how well the HypoPT-SD scores reflected the changes experienced by patients.

The present study aimed to assess the psychometric properties of the HypoPT-SD when used as a daily diary with a 24-hour recall period in a group of patients with a range of HypoPT disease severities.

Materials and Methods

PRO Tool Design

The HypoPT-SD was developed using the US Food and Drug Administration (FDA) guidance for PROs.16 It comprises a symptom subscale (items 1–7), single items for anxiety (item 8) and sadness or depression (item 9), and an impact subscale (items 10–13). Symptom subscale items and anxiety and sadness or depression items are scored on a 5-point scale. Impact subscale items are scored on a 3-point scale (Supplemental Material 1). A lower score indicates lower symptom severity or impact on daily living and a higher score indicates higher symptom severity or impact. Full details of the HypoPT-SD have been published previously.13,14

Data Source

Psychometric analyses were performed using data from a Phase 3b/4 randomized, double-blind, placebo-controlled study designed to evaluate symptom improvement and metabolic control among adult patients (18–85 years of age [inclusive]) with symptomatic HypoPT treated with rhPTH(1–84) (BALANCE; NCT03324880).17 The protocol, all protocol amendments, the final approved informed consent document, relevant supporting information, and all types of subject recruitment information were submitted to and approved by the relevant institutional review board or independent ethics committee (Supplemental Material 2).

Eligible patients (N=93) were randomized 1:1 to receive rhPTH(1–84) as an adjunctive treatment with active vitamin D and/or calcium supplements (n=45), or a placebo with active vitamin D and/or calcium supplements (n=48). The trial included a 3-week screening period, a 16-week dose-titration period, a 10-week maintenance-dosing period, and a 4-week safety follow-up period. The psychometric analyses focused on data collected at baseline, Week 4, Week 12, and Week 26 (end of treatment [EOT]). All patients with at least one non-missing item score at either baseline or EOT were included in the analyses (N=93).

For patients who discontinued or withdrew from the study, only data collected up to the point of withdrawal were included in the psychometric analyses. No imputation was performed for missing data after a patient’s withdrawal, and post-withdrawal data points were considered missing.

To ensure consistency and data integrity across all sites and time points, the study followed a harmonized protocol and statistical analysis plan, which specified the timing, administration, and data collection procedures for all PRO and psychometric tools.

Response Scale and Scoring

Patients completed the HypoPT-SD at home on an electronic device at a consistent time each day (approximately 5–12 hours after self-administration of rhPTH(1–84) or placebo, which occurred in the morning). Item scores at each visit (baseline, Week 4, Week 12, and EOT) were computed by taking the mean of the daily item responses for each item over the 14-day period before the visits at baseline, Week 4, Week 12, and EOT, respectively. If data were not available for at least 4 out of 7 days during both 7-day periods within the 14-day period, the individual item score was treated as missing.

The symptom subscale score at each visit (baseline, Week 4, Week 12, and EOT) was computed as the mean of the symptom item scores from each visit, with the score treated as missing if more than 3 of the 7 symptom item scores were missing. The impact subscale score at each visit was computed as the mean of the impact item scores from each visit, with the score treated as missing if any of the impact item scores were missing. Individual 14-day mean scores were calculated for the anxiety (item 8) and sadness or depression (item 9) items.

Response Distribution

Descriptive statistics for HypoPT-SD item and subscale scores were produced from data at baseline, Week 4, Week 12, and EOT to evaluate non-response, maximum (worst) and minimum (best) score effects, and patterns of change. Floor (minimum; best) and ceiling (maximum; worst) effects occur when 20% or more of the respondents record the minimum (best) or maximum (worst) score possible on the scale, respectively.

Inter-Item Correlation

Inter-item correlations within the HypoPT-SD symptom and impact subscales were computed from data at baseline and EOT using Pearson correlation coefficients (r). Values obtained are in the range of −1 to 1; −1 and 1 are the strongest possible negative and positive correlations, respectively, and a value of 0 indicates no correlation.

In general, items with low correlations (<|0.3|) may not be sufficiently related to warrant inclusion in a composite score designed to measure a single underlying construct, whereas items with high correlations (>|0.8|) may indicate possible redundancy.

Internal Consistency and Reliability

To assess the consistency and reliability of the HypoPT-SD, Cronbach’s coefficient α was computed for the subscale scores at baseline. Cronbach’s α values of 0 indicate that all items within a group are uncorrelated, whereas values of 1 indicate perfectly correlated item scores. Optimum Cronbach’s α scores are in the range of 0.7–0.9.18,19

Test–Retest Reliability

Test–retest reliability was evaluated by calculating intraclass correlation coefficients (ICCs) using HypoPT-SD scores at baseline and Week 4 for all patients with sufficient data (n=86) and a subset of stable patients (n=26). Stability was defined as no change in the Patient Global Impression of Severity (PGI-S) scores. ICC estimates were computed using a 2-way (patients and time) mixed-effects analysis of variance (ANOVA) with absolute agreement for single measures. ICCs exceeding 0.70 are generally accepted as indicating stability between measurements taken at different time points.20

Construct Validity

Cross-sectional correlational analyses and convergent and divergent validity hypotheses tests were conducted at baseline and EOT to explore the associations between HypoPT-SD item and subscale scores and the following conceptually linked measure scores: (1) PGI-S;21 (2) Work Productivity and Activity Impairment Questionnaire for HypoPT (WPAI:HypoPT);22 (3) EuroQol-5 dimension 5-level (EQ-5D-5L);23 (4) Functional Assessment of Chronic Illness Therapy – Fatigue Scale (FACIT-Fatigue);24 (5) Functional Assessment of Cancer Therapy – Cognitive Function (FACT-Cog);25 and (6) the 36-item Short Form Health Survey version 2 (SF-36v2).26

For the FACIT-Fatigue, FACT-Cog, SF-36v2, and EQ Visual Analogue Scale (VAS), higher scores indicate better outcomes, whereas higher scores for the HypoPT-SD, PGI-S, WPAI:HypoPT, and EQ-5D-5L indicate worse outcomes.20–26

Known-Groups Validity

To assess the ability of the HypoPT-SD to discriminate between patients with different disease severities, ANOVAs were conducted to compare HypoPT-SD symptom and impact subscale scores at baseline and EOT between groups of patients with different PGI-S responses.

Ability to Detect Change

Responsiveness was assessed by analyzing changes in HypoPT-SD item and subscale scores stratified by PGI-S and Patient Global Impression of Change (PGI-C) classifications21 and clinical composite responder designation. A clinical composite responder was defined as having all of: (1) albumin-corrected serum calcium (ACSC) between 1.87 mmol/L (7.5 mg/dL) and the upper limit of normal (ULN) for the central laboratory normal range; (2) dose of active vitamin D decreased by at least 50% from baseline; and (3) at least a 50% reduction from baseline in oral calcium supplement dose (this criterion was considered met if the patient’s baseline calcium dose was <1000 mg and did not increase during the study). Mean change from baseline in HypoPT-SD scores between groups defined by these external criteria was examined using ANOVA. An overall F-test and selected pairwise comparisons were reported.

Threshold for Meaningful Within-Patient Change

The changes from baseline to EOT in PGI-S scores, and PGI-C scores at EOT were proposed as potential anchor measures for determining the responder thresholds for the HypoPT-SD symptom and impact subscales. Correlations between changes in HypoPT-SD scores and in PGI-S and PGI-C scores were calculated. Correlations of change of 0.37 or more in magnitude are often used to signify that the proposed anchor measures are acceptable, based on achieving a large effect size using Cohen’s rule of thumb.27–30

Patients were classified based on their degree of improvement using the anchor criterion (eg, a 2-point or 1-point improvement in the PGI-S, or “much improved” in the PGI-C). The following a priori anchors were considered as primary: meaningful within-patient change of a 2-point improvement on the PGI-S or “much improved” on the PGI-C.

The mean change in the HypoPT-SD symptom and impact subscales from baseline to EOT in the responder subgroup were computed and identified as the responder definitions, thus characterizing meaningful change in the HypoPT-SD symptom and impact subscales scores. Alongside anchor-based methods, two widely used distribution-based methods (the half-standard deviation [SD] and standard error [SE] of measurement) were computed and used to estimate meaningful within-patient change. Distribution-based estimates are often considered as the lower bound for estimating meaningful within-patient change.31

The FDA recommends that, for measures that are positioned as key trial endpoints, sponsors present probability density function (PDF) and empirical cumulative distribution function (CDF) plots of the distribution of PRO change scores by the levels of the selected anchor measure.32 The CDF plot aids in the identification of an appropriate threshold of meaningful within-patient change in the PRO scores based on the anchor using the 50th percentile (median) estimate for the selected anchor level.16

Results

Baseline Characteristics and Demographics

The psychometric analysis set comprised 93 patients, of whom 88.2% were female and 96.8% were white, and the mean (SD) age was 48.5 (11.3) years (Table 1).

Table 1 Patient Demographics and Characteristics at Screening17

HypoPT-SD Scores Distributions

The mean and median item-level scores for the symptom subscale items (items 1–7) and impact subscale items (items 10–13) at baseline were above the midpoints of the 0–4 and 0–2 response scales, respectively (Figures 1A and B). The mean (SD) HypoPT-SD symptom subscale score was 2.4 (0.7) at baseline and 1.3 (0.9) at EOT (Figure 1C). The mean (SD) impact subscale score was 1.3 (0.4) at baseline and 0.9 (0.5) at EOT (Figure 1D).

Figure 1 HypoPT-SD item score and distributions at baseline (A) and EOT (B) and symptom (C) and impact (D) subscale scores at baseline and EOT. Data are mean ± SD.

Abbreviations: EOT, end of treatment; HypoPT-SD, Hypoparathyroidism Symptom Diary; SD, standard deviation.

For the symptom subscale items (items 1–7), the minimum (best) possible scores were reported by 0.0–3.3% of patients at baseline, and 4.3–21.7% of patients at EOT. For the impact subscale items (items 10–13), the minimum (best) scores were reported by 0.0–5.4% of patients at baseline, and 11.6–29.0% of patients at EOT (Supplemental Material 3).

For the symptom subscale items (items 1–7), the maximum (worst) possible scores were reported by 0.0–15.2% of patients at baseline, and 0.0–5.8% of patients at EOT. For the impact subscale items (items 10–13), the maximum (worst) possible scores were reported by 15.2–23.9% of patients at baseline, and 5.8–14.5% of patients at EOT (Supplemental Material 3).

The response distributions for the PGI-C for HypoPT (recorded at baseline, Week 12, and EOT) and PGI-S for HypoPT (recorded at baseline, Week 4, Week 12, and EOT) are shown in Supplemental Material 4.

Inter-Item Correlation

The structure of the HypoPT-SD was confirmed at baseline and EOT by computing inter-item correlations. The inter-item correlations for the symptom subscale items ranged from 0.35 to 0.85 at baseline (Table 2), and from 0.49 to 0.93 at EOT (Table 3). Specifically, correlations between item 7 (slowed or confused thinking) and some of the other items (ie, item 1 [muscle cramps], item 4 [muscle spasms or twitching], and item 5 [feelings of heaviness in arms or legs]) tended to be slightly weaker in magnitude than correlations among the other item pairs, especially at baseline. At baseline, items 1 (muscle cramps) and 4 (muscle spasms or twitching) (r=0.85), and items 2 (tingling) and 3 (numbness) (r=0.79) were the most strongly correlated item pairs. At EOT, items 1 (muscle cramps) and 4 (muscle spasms or twitching) (r=0.93), items 1 (muscle cramps) and 2 (tingling) (r=0.84), and items 2 (tingling) and 4 (muscle spasms or twitching) (r=0.83) were the most strongly correlated pairs (Tables 2 and 3).

Table 2 HypoPT-SD Inter-Item and Inter-Total Correlations at Baseline

Table 3 HypoPT-SD Inter-Item and Inter-Total Correlations at EOT

The HypoPT-SD impact subscale items tended to be more consistently correlated at both baseline and EOT than the symptom subscale items. The inter-item correlations for subscale items ranged from 0.56 to 0.84 at baseline (Table 2) and from 0.62 to 0.85 at EOT (Table 3). At both time points, item 11 (impact on exercise) and item 12 (impact on work) were the most strongly correlated items (r=0.84 at baseline, r=0.85 at EOT), whereas item 10 (impact on sleep) and item 13 (impact on relationships) were the most weakly correlated items at both time points (r=0.56 at baseline, r=0.62 at EOT). Item 8 (anxiety) and item 9 (sadness or depression) were highly correlated with one another at baseline (r=0.88) and at EOT (r=0.94) (Tables 2 and 3).

Item–Total Correlation

In this section, “total” refers to the symptom subscale total score or the impact subscale total score.

Corrected symptom item–total correlations were between 0.57 and 0.82 at baseline and between 0.69 and 0.88 at EOT. All symptom items correlated more strongly with the symptom subscale total score than the impact subscale total score at baseline and EOT, with the exception that the item 7 (slowed or confused thinking) score correlated less strongly with the symptom subscale total score (r=0.57) than with the impact subscale total score (r=0.61) at baseline (Tables 2 and 3).

The corrected item–total correlations for the impact items were between 0.66 and 0.83 at baseline and between 0.71 and 0.85 at EOT. Each of the impact items correlated more strongly with the impact subscale total score than the symptom subscale total score at both time points (Tables 2 and 3).

Internal Consistency and Reliability

Cronbach’s α values at baseline were 0.90 for the symptom subscale and 0.88 for the impact subscale (Supplemental Material 5). Removing any of the items from either of the subscales did not result in an increased Cronbach’s α value for either subscale (Supplemental Material 5).

Test–Retest Reliability

Most of the ICCs that were calculated from all patients with sufficient data (n=86) were below the 0.70 threshold (Table 4). Using patients with stable PGI-S scores (n=26) between baseline and Week 4, ICCs for all items exceeded 0.70, except for item 10 (impact on sleep; ICC=0.61 [95% confidence interval (CI) 0.20, 0.82]). The ICCs for both the symptom and impact subscale total scores in stable patients exceeded 0.70, with estimates of 0.80 (95% CI 0.19, 0.93) for symptom subscale scores and 0.77 (95% CI 0.37, 0.91) for impact subscale scores (Table 4).

Table 4 HypoPT-SD Test–Retest Reliability

Construct Validity

At both baseline and EOT, HypoPT-SD item scores correlated positively with scores for the PGI-S (0.38–0.75 at baseline, 0.28–0.73 at EOT) and EQ-5D-5L index domain scores (0.11–0.76 at baseline, 0.09–0.78 at EOT) (Tables 5 and 6). In general, HypoPT scores were also positively correlated with WPAI:HypoPT scores at both time points, although there were a few small negative correlations (Tables 5 and 6).

Table 5 Construct Validity Correlations Among the HypoPT-SD Symptom and Impact Subscale and Other Conceptually Linked Patient-Reported Outcome Tools at Baseline

Table 6 Construct Validity Correlations Among the HypoPT-SD Symptom and Impact Subscale and Other Conceptually Linked Patient-Reported Outcome Tools at EOT

Almost all HypoPT-SD item scores correlated negatively with scores for the FACIT-Fatigue, FACT-Cog, SF-36v2, and EQ VAS at baseline and EOT. The only exceptions included small positive correlations between item 8 (anxiety) and item 9 (sadness or depression) of the HypoPT-SD and the SF-36v2 physical component score (PCS) at EOT (r=0.12 and r=0.09, respectively) (Tables 5 and 6).

The majority of the convergent validity hypotheses were supported, and all but two of the convergent validity correlations that did not achieve the |r| >0.50 cutoff were above |0.40|. Not all divergent validity hypotheses were supported, and most discrepancies resulted from items 7 (slowed or confused thinking), 11 (impact on exercise), 12 (impact on work), and 13 (impact on relationships) not discriminating adequately between the FACIT-Fatigue and FACT-Cog measures (Tables 5 and 6).

Known-Groups Validity

The HypoPT symptom subscale mean scores showed significant overall differences across the PGI-S-defined severity groups at both baseline and EOT (p<0.0001) (Figures 2A and B). At baseline, there were significant pairwise differences in scores between the “moderate”, “severe” and “very severe” groups (p<0.001). However, the “no symptoms” group (n=1) (mean [SD] 2.90 [0.50]) did not have lower scores than the “moderate” (1.86 [0.09]) and “severe” (2.45 [0.08]) groups (Figure 2A). At EOT, there were no statistically significant differences in HypoPT-SD symptom subscale scores in any pairwise comparisons of the “mild”, “moderate” and “severe” PGI-S groups (Figure 2B).

Figure 2 HypoPT-SD subscale scores by PGI-S classification at baseline (A [symptom] and C [impact]) and EOT (B [symptom] and D [impact]). LS mean differences were compared using ANOVA. X axes show the responses from the PGI-S. Overall adjusted p values for AD: p<0.0001; pairwise comparisons are only reported for groups with ≥5 patients. *p<0.05, ***p<0.001, ****p<0.0001.

Abbreviations: ANOVA, analysis of variance; EOT, end of treatment; HypoPT-SD, Hypoparathyroidism Symptom Diary; LS, least-squares; ns, not significant; PGI-S, Patient Global Impression of Severity; SEM, standard error of the mean.

The impact subscale mean scores for the PGI-S-defined severity group followed the expected pattern at baseline and EOT, with significant (p<0.0001) overall differences across all groups at both time points (Figures 2C and 2D). Furthermore, at baseline, all pairwise comparisons were statistically significant (p<0.0001) (Figure 2C). At EOT, only the pairwise comparison between the “mild” and “severe” groups was statistically significant (p=0.011) (Figure 2D).

Ability to Detect Change

Greater changes in HypoPT-SD symptom and impact subscale scores from baseline to EOT were observed in groups of patients who reported improvements in PGI-S or PGI-C scores than in groups who did not report improvements (Figures 3 and 4, Tables 7 and 8).

Table 7 HypoPT-SD Symptom Subscale Ability to Detect Change

Table 8 HypoPT-SD Impact Subscale Ability to Detect Change

Figure 3 Change in HypoPT-SD (A) symptom and (B) impact subscales by change in PGI-S from baseline to EOT. Plus signs represent the mean, the horizontal line inside each box represents the median; the horizontal lines at the bottom and top of each box represent the 25th and 75th percentile, respectively; vertical lines below and above the box represent 1.5 x IQR below the 25th and 75th percentile, respectively. Any values that lie outside that range are shown as individual points.

Abbreviations: EOT, end of treatment; HypoPT-SD, Hypoparathyroidism Symptom Diary; PGI-S, Patient Global Impression of Severity.

Figure 4 Change in HypoPT-SD (A) symptom and (B) impact subscales from baseline to EOT by PGI-C. Plus signs represent the mean, the horizontal line inside each box represents the median; the horizontal lines at the bottom and top of each box represent the 25th and 75th percentile, respectively; vertical lines below and above the box represent 1.5 x IQR below the 25th and 75th percentile, respectively. Any values that lie outside that range are shown as individual points.

Abbreviations: EOT, end of treatment; HypoPT-SD, Hypoparathyroidism Symptom Diary; PGI-C, Patient Global Impression.

Changes in each subscale score across groups defined by the PGI-C and changes in the PGI-S score were all found to be statistically significant (p<0.05). This applies to both subscales and both grouping variables. However, despite some relatively large effect sizes (eg, Cohen’s d of −1.18 comparing 2-point improvement with no change), none of the tests of pairwise differences in change in symptom scores were statistically significant using the PGI-S or PGI-C response categories. Although there were also some large effect sizes for the impact subscale, the only significant mean difference (p=0.0011) in scores was observed when comparing the 2-point improvement (mean −0.81) with the no change (−0.09) groups for the change in PGI-S (Figures 3 and 4, Tables 7 and 8).

The observed differences in changes from baseline to EOT in HypoPT-SD scores were minimal for both subscales in patients categorized as responders or non-responders using the clinical composite criteria (symptom subscale least-squares [LS] mean [SE]; responder −0.97 [0.21] vs non-responder −1.10 [0.13]) (impact subscale LS mean [SE]; responder −0.55 [0.12] vs non-responder −0.43 [0.08]). There were no significant differences in mean change in the symptom subscale scores in responders compared with non-responders (p=0.60). There was also no significant difference in mean change on the impact subscale scores in responders compared with non-responders (p=0.40) (Tables 7 and 8).

The relationship between the change in the HypoPT-SD symptom and impact subscales and the PGI-C and change in PGI-S are particularly important because the PGI-C and PGI-S are potential anchors for estimating thresholds of meaningful change. The potential anchors and subscale scores were strongly correlated (|r|≥0.50). The symptom subscale score correlated more strongly with the PGI-C groups (r=0.71) than with change in PGI-S (r=0.58), whereas the correlation between the impact subscale score and the change in PGI-S (r=0.64) was slightly stronger than the correlation between the impact subscale score and the PGI-C (r=0.58) (Supplemental Material 6).

Thresholds for Minimum Meaningful Change

Using the anchor-based method for the HypoPT-SD symptom subscale, a 2-point improvement in the PGI-S score and “much improved” on the PGI-C scale both corresponded to a mean change in score of 1.5 (median 1.5). For the impact subscale, a 2-point improvement in the PGI-S score and “much improved” on the PGI-C scale both corresponded to a mean change in score of 0.8 (Supplemental Materials 7 and 8).

One PDF plot and one empirical CDF plot were produced for each subscale and anchor (PGI-C or change in PGI-S), resulting in eight plots (Supplemental Materials 7 and 8). For the CDF plots using the PGI-C as the anchor measure (Supplementary Materials 7A and 7C), curves for stronger anchor measures did not overlap, except at the extreme values. The 50th percentile (median) value for the “much improved” curve shown in Supplemental Materials 7A and 7B indicates a value of 1.5 as meaningful within-patient improvement (responder threshold) for change in the HypoPT-SD symptom subscale. Similarly, the median value for the “much improved” curve in Supplemental Materials 7C and 7D indicates a value of 0.8 as meaningful within-patient improvement (responder threshold) for change in the HypoPT-SD impact subscale.

Within the PDF plots for the symptom subscale (Supplemental Materials 8A and 8B), there was substantial overlap between the “no change” and “minimally improved” groups (especially with the PGI-C anchor), although the “minimally improved” group had a cluster of change scores that were more extreme (negative) than the “no change” group, leading to a larger mean change score. The distribution for the “much improved” group was more distinct (especially with the PGI-C anchor), which provides additional support for the use of this group when estimating the threshold (Supplemental Materials 8A and 8B). A similar pattern was seen for the PDF plots for the impact subscale score (Supplemental Materials 8C and 8D).

Anchor- and distribution-based methods (half-SD and SE) were used to estimate the meaningful within-patient change. For these computations, baseline SDs (symptom SD=0.7, impact SD=0.4; Supplemental Material 3) and test–retest ICCs for the PGI-S stable group (symptom ICC=0.80, impact ICC=0.77; Table 4) were used. The half-SD and SE for the symptom subscale were estimated to be 0.33 and 0.30, respectively, and the half-SD and SE for the impact subscale were estimated to be 0.21 and 0.20, respectively. These estimates are well below the 1.5 and 0.8 improvement anchor-based threshold values and provide evidence that the thresholds may be reasonable estimates in comparison to a moderate effect size and the SEs of the scores.

Discussion

This study evaluated the psychometric properties of the HypoPT-SD when administered as a daily diary to 93 patients with symptomatic HypoPT. In contrast to previous studies, this analysis included patients with a wide range of HypoPT severities at baseline and at EOT. This inclusion of patients who experienced changes in disease severity and impact throughout the study facilitated a thorough assessment of the measure’s capacity to detect changes and establish a within-patient meaningful change threshold.13–15 Overall, both the HypoPT-SD symptom and impact subscales demonstrated adequate psychometric properties. There was substantial variability in the individual item and subscale scores across the study period. Although there were some observed floor (minimum) and ceiling (maximum) effects, typically it is not as problematic for patients to have the maximum (worst) score at baseline (because this leaves room for improvement) and the minimum (best) score at EOT (because this may be due to an effective treatment). Therefore, none of the observed floor or ceiling effects were concerning.

Individual item scores in both subscales correlated with one another and the total subscale scores, supporting the internal consistency of the HypoPT-SD. The estimated internal consistency reliability was within the optimal range (Cronbach’s α values of 0.90 and 0.88 for the symptom and impact subscale, respectively) as suggested by Streiner et al and Bland and Altman.18,19 Furthermore, removing any of the items from either subscale did not result in an increased Cronbach’s α value for either subscale, which provides strong support for the internal consistency reliability of both subscales. Additionally, the estimated test–retest reliability ICC exceeded the 0.70 cutoff in patients with stable PGI-S scores.

Construct validity of the HypoPT-SD symptom and impact subscales was demonstrated by significant cross-sectional correlations with most of the conceptually linked PRO tools analyzed. Small negative correlations observed for scores for the HypoPT-SD and WPAI:HypoPT were unexpected. However, the WPAI:HypoPT correlation analysis contained the smallest sample size in the study. Within this measure, the percent work time missed, percent impairment while working and percent overall work impairment due to HypoPT scores had substantially more missing data (≥54.8%) than the percent activity impairment score (11.8%) and the other supporting measures (≤10.8%). This is likely because most questions in the WPAI are work-specific and not everyone in this study was working. The majority of convergent validity hypotheses were supported. Not all divergent validity hypotheses were supported, and most discrepancies arose from items 7 (slowed or confused thinking), 11 (impact on exercise), 12 (impact on work), and 13 (impact on relationships) not discriminating adequately between the FACIT-Fatigue and FACT-Cog measures. This is not surprising given that slowed or confused thinking (ie, brain fog, as assessed in item 7), as well as impacts on exercise, work, and relationships (as assessed in items 11–13), can be associated with both fatigue and cognitive dysfunction. Taken together, the results suggest strong support for the construct validity of the HypoPT-SD item and subscale scores.

This analysis demonstrated that the HypoPT-SD scores were responsive to change. Significant mean differences in change in HypoPT-SD scores across groups defined by PGI-C and change in PGI-S were observed. Furthermore, because the data in this analysis were taken from patients with a range of disease severities that changed between baseline and EOT, thresholds for meaningful within-patient improvement could be characterized for the first time. Findings from this study, in which the HypoPT-SD was administered as a daily diary, are aligned with and support the findings of a previously published psychometric evaluation in patients with stable HypoPT.15

Other HypoPT-specific PRO tools have been developed. The Hypoparathyroidism Patient Experience Scale (HPES) has undergone psychometric evaluation using data from an observational study and a Phase 2 trial, and has been used for symptom evaluation in a Phase 3 trial.11,33 HPES also includes symptom and impact measures, but comprises a total of 43 items compared with only 13 items in the HypoPT-SD.11,12,33,34 The HypoPT questionnaire (HPQ) is a 28-item questionnaire that spans five domains and was developed through an analytical–empirical approach. The psychometric properties of the initial 40-item questionnaire were tested using data from a German cross-sectional study, but the revised 28-item questionnaire has not yet been evaluated.10 To date, the HypoPT-SD has not been compared directly with these HypoPT-specific PRO tools.

Strengths and Limitations

The strengths of this analysis include the variability of disease severity and the change in disease severity during the observation period. This study has also proposed two thresholds for meaningful within-patient change for both the HypoPT-SD symptom and impact subscale scores, which could be used as disease-specific PRO endpoint targets in future clinical trials of patients with HypoPT.

The results of this discussion should be considered in the context of the following limitations. 1. The 4-week period between test and retest was long in the context of this analysis. Some participants may have experienced changes in their symptom severity or overall health status during the 4-week period, which could lead to an underestimation of the true test–retest reliability of the instrument. It should be noted, however, that subgroup analysis restricted to patients whose PGI-S scores remained stable between baseline and Week 4 did show good test–retest reliability for the HypoPT-SD. 2. The difference in data collection periods between the HypoPT-SD (daily data collection for 14 days before each visit at baseline, Week 4, Week 12, and EOT) and PRO tools used to support the analysis (one data collection for each PRO tool within 2 days before every visit at baseline, Week 4, Week 12, and EOT). These differences may have affected the direct comparability of outcomes between tools. Specifically, the HypoPT-SD’s daily data collection may have been more sensitive to short-term changes or day-to-day variability in symptoms, whereas the other PROs may be less influenced by transient fluctuations and more reflective of the patient’s overall status at the time of assessment. As a result, correlations between the HypoPT-SD and other PROs may be attenuated, and interpretation of cross-sectional associations should be considered in the context of the different recall periods and data collection frequencies.

Conclusion

The cumulative evidence from this study indicates that the HypoPT-SD is a reliable and valid measure of key symptoms and impacts of HypoPT. These analyses lay the psychometric groundwork for the use of the HypoPT-SD in future clinical trials in adults with HypoPT.

Abbreviations

ACSC, albumin-corrected serum calcium; ANOVA, analysis of variance; CDF, cumulative distribution function; CI, confidence interval; COVID-19, coronavirus disease 19; EQ-5D-5L, EuroQol-5 dimension 5-level; EOT, end of treatment; F, F-statistic; FACIT-Fatigue, Functional Assessment of Chronic Illness Therapy – Fatigue Scale; FACT-Cog, Functional Assessment of Cancer Therapy – Cognitive Function; FDA, Food and Drug Administration; HPQ, HypoPT questionnaire; HRQoL, health-related quality of life; HypoPT, hypoparathyroidism; HypoPT-SD, HypoPT Symptom Diary; ICC, intraclass correlation coefficient; LS, least-squares; max, maximum; MCS, mental component summary; min, minimum; ns, not significant; PCS, physical component score; PDF, probability density function; PGI-C, Patient Global Impression of Change; PGI-S, Patient Global Impression of Severity; PRO, patient-reported outcome; PTH, parathyroid hormone; SD, standard deviation; SE, standard error; SEM, standard error of the mean; SF-36v2, 36-item Short Form Health Survey version 2; t, t-statistic; ULN, upper limit of normal; VAS, Visual Analogue Scale; WPAI, Work Productivity and Activity Impairment; WPAI:HypoPT, Work Productivity and Activity Impairment Questionnaire for HypoPT.

Data Sharing Statement

The datasets, including the redacted study protocol, redacted statistical analysis plan, and individual participants’ data supporting the results reported in this article, will be made available within 3 months from initial request to researchers who provide a methodologically sound proposal. The data will be provided after its de-identification, in compliance with applicable privacy laws, data protection, and requirements for consent and anonymization.

Ethics Approval and Consent to Participate

The data in the manuscript were originally collected from the BALANCE clinical trial (ClinicalTrials.gov ID NCT03324880). This manuscript describes the analysis of de-identified data from this trial. The RTI institutional review board deems this analysis to be “Not Research with Human Subjects”, as defined by the criteria of: analysis of coded, de-identified data/specimens, or anonymous data/specimens or gathering information from living individuals when the data may wholly and accurately be characterized as “about what and not about whom”. The original clinical trial study, BALANCE including details of secondary analyses, any protocol amendments, the final approved informed consent document, relevant supporting information, and all types of patient recruitment information were submitted by the investigators to an institutional review board or independent ethics committee, as applicable, and approved (as appropriate) before study initiation. The study was conducted in accordance with current applicable regulations, International Council for Harmonisation guidelines, European Union Directive 2001/20/EC and its updates/revisions, the principles of the Helsinki declaration, and local ethical and legal requirements. All patients provided informed consent before entering the study, including consent for the secondary analysis of data collected. The psychometric evaluation was carried out by the authors using de-identified data in accordance with the data sharing policy described above.

Acknowledgments

Medical writing support was provided by Mark Elms PhD of PharmaGenesis London, London, UK and funded by Takeda Development Center Americas Inc. in accordance with Good Publication Practice 2022 guidelines.35

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis, and interpretation, or in all these areas; took part in drafting, revising, or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.

Funding

This study was sponsored by Takeda Development Center Americas Inc., Lexington, MA, USA.

Disclosure

SW, FC, and OA are employed by Takeda Development Center Americas Inc. and hold stock/stock options in Takeda. NR, SY, DK, and SM are employed by RTI Health Solutions. The authors report no other conflicts of interest in this work.

References

1. Clarke BL, Brown EM, Collins MT, et al. Epidemiology and diagnosis of hypoparathyroidism. J Clin Endocrinol Metab. 2016;101:2284–308. doi:10.1210/jc.2015-3908

2. Bilezikian JP, Khan A, Potts JT Jr, et al. Hypoparathyroidism in the adult: epidemiology, diagnosis, pathophysiology, target-organ involvement, treatment, and challenges for future research. J Bone Miner Res. 2011;26:2317–2337. doi:10.1002/jbmr.483

3. Shoback D. Clinical practice. Hypoparathyroidism. N Engl J Med. 2008;359:391–403. doi:10.1056/NEJMcp0803050

4. Hadker N, Egan J, Sanders J, Lagast H, Clarke BL. Understanding the burden of illness associated with hypoparathyroidism reported among patients in the PARADOX study. Endocr Pract. 2014;20:671–679. doi:10.4158/EP13328.OR

5. Siggelkow H, Clarke BL, Germak J, et al. Burden of illness in not adequately controlled chronic hypoparathyroidism: findings from a 13-country patient and caregiver survey. Clin Endocrinol. 2020;92:159–168. doi:10.1111/cen.14128

6. Astor MC, Løvås K, Debowska A, et al. Epidemiology and health-related quality of life in hypoparathyroidism in Norway. J Clin Endocrinol Metab. 2016;101:3045–3053. doi:10.1210/jc.2016-1477

7. Hepsen S, Akhanli P, Sakiz D, et al. The effects of patient and disease-related factors on the quality of life in patients with hypoparathyroidism. Arch Osteoporos. 2020;15:75. doi:10.1007/s11657-020-00759-8

8. Kontogeorgos G, Mamasoula Z, Krantz E, Trimpou P, Landin-Wilhelmsen K, Laine CM. Low health-related quality of life in hypoparathyroidism and need for PTH analog. Endocr Connect. 2022;11:e210379. doi:10.1530/EC-21-0379

9. Büttner M, Musholt TJ, Singer S. Quality of life in patients with hypoparathyroidism receiving standard treatment: a systematic review. Endocrine. 2017;58:14–20. doi:10.1007/s12020-017-1377-3

10. Wilde D, Wilken L, Stamm B, et al. The HPQ-development and first administration of a questionnaire for hypoparathyroid patients. JBMR Plus. 2019;4:e10245. doi:10.1002/jbm4.10245

11. Brod M, McLeod L, Markova D, et al. Psychometric validation of the Hypoparathyroidism Patient Experience Scales (HPES). J Patient Rep Outcomes. 2021;5:70. doi:10.1186/s41687-021-00320-2

12. Brod M, Waldman LT, Smith A, Karpf D. Assessing the patient experience of hypoparathyroidism symptoms: development of the hypoparathyroidism patient experience scale-symptom (HPES-symptom). Patient. 2020;13:151–162. doi:10.1007/s40271-019-00388-5

13. Coles T, Chen K, Nelson L, et al. Psychometric evaluation of the hypoparathyroidism symptom diary. Patient Relat Outcome Meas. 2019;10:25–36. doi:10.2147/PROM.S179310

14. Martin S, Chen K, Harris N, Vera-Llonch M, Krasner A. Development of a patient-reported outcome measure for chronic hypoparathyroidism. Adv Ther. 2019;36:1999–2009. doi:10.1007/s12325-019-00999-2

15. Nelson L, Ing S, Rubin MR, et al. Psychometric analysis of the patient-reported Hypoparathyroidism Symptom Diary symptom subscale using data from two clinical trials. Patient Relat Outcome Meas. 2023;14:355–367. doi:10.2147/PROM.S414794

16. Food & Drug Administration. Guidance for industry. Patient-reported outcome measures: use in medical product development to support labeling claims. 2009. Available from: https://www.fda.gov/media/77832/download. Accessed October 09, 2023.

17. National Institutes of Health National Library of Medicine ClinicalTrials.gov. A study to learn if recombinant human parathyroid hormone [rhPTH(1-84)] can improve symptoms and metabolic control in adults with hypoparathyroidism (BALANCE). 2023. Available from: https://clinicaltrials.gov/study/NCT03324880. Accessed October 09, 2023.

18. Streiner DL, Norman GR, Cairney J. Health Measurement Scales. A Practical Guide to Their Development and Use. 5th ed. Oxford, UK: Oxford University Press; 2014.

19. Bland JM, Altman DG. Cronbach’s alpha. BMJ. 1997;314(7080):572. doi:10.1136/bmj.314.7080.572

20. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–163. doi:10.1016/j.jcm.2016.02.012

21. Guy W. ECDEU Assessment Manual for Psychopharmacology. Rockville, MA, USA: National Institute of Mental Health; 1976.

22. Reilly MC, Zbrozek AS, Dukes EM. The validity and reproducibility of a work productivity and activity impairment instrument. Pharmacoeconomics. 1993;4:353–365. doi:10.2165/00019053-199304050-00006

23. EuroQol. EQ-5D-5L. 2021. Available from: https://euroqol.org/eq-5d-instruments/eq-5d-5l-about/. Accessed October 09, 2023.

24. Cella D, Lai JS, Chang CH, Peterman A, Slavin M. Fatigue in cancer patients compared with fatigue in the general United States population. Cancer. 2002;94:528–538. doi:10.1002/cncr.10245

25. Functional Assessment of Chronic Illness Therapy. Functional Assessment of Cancer Therapy - Cognitive Function (FACT-Cog). 2008. Available from: https://www.facit.org/measures/FACT-Cog. Accessed October 09, 2023.

26. Ware JE Jr. SF-36 health survey update. Spine. 2000;25:3130–3139. doi:10.1097/00007632-200012150-00008

27. Cohen J. Statistical Power Analysis for the Behavioural Sciences. 2nd ed. New York City, NY, USA: Lawrence Erlbaum Associates, Publishing; 1988.

28. Fayers P, Hays R. Assessing Quality of Life in Clinical Trials. 2nd ed. Oxford, UK: Oxford University Press; 2005.

29. Hays RD, Farivar SS, Liu H. Approaches and recommendations for estimating minimally important differences for health-related quality of life measures. COPD. 2005;2:63–67. doi:10.1081/COPD-200050663

30. Fayers PM, Hays RD. Don’t middle your MIDs: regression to the mean shrinks estimates of minimally important differences. Qual Life Res. 2014;23:1–4. doi:10.1007/s11136-013-0443-4

31. King MT, Dueck AC, Revicki DA. Can methods developed for interpreting group-level patient-reported outcome data be applied to individual patient management? Med Care. 2019;57 Suppl 5 Suppl 1:S38–S45. doi:10.1097/MLR.0000000000001111

32. Food & Drug Administration. Incorporating clinical outcome assessments into endpoints for regulatory decision-making. 2019. Available from: https://www.fda.gov/media/132505/download. Accessed February 16, 2024.

33. Brod M, Waldman LT, Smith A, Karpf D. Living with hypoparathyroidism: development of the hypoparathyroidism patient experience scale-impact (HPES-Impact). Qual Life Res. 2021;30:277–291. doi:10.1007/s11136-020-02607-1

34. Khan AA, Rubin MR, Schwarz P, et al. Efficacy and safety of parathyroid hormone replacement with TransCon PTH in hypoparathyroidism: 26-week results from the phase 3 PaTHway trial. J Bone Miner Res. 2023;38:14–25. doi:10.1002/jbmr.4726

35. DeTora LM, Lane T, Sykes A, DiBiasi F, Toroser D, Citrome L. Good Publication Practice (GPP) guidelines for company-sponsored biomedical research: 2022 update. Ann Intern Med. 2023;176:eL220490. doi:10.7326/L22-0490

Creative Commons License © 2026 The Takeda Pharmaceutical Company Limited. This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.