Skip to main content

Translation and validation of the Lee Chronic Graft-versus-Host Disease Symptom Scale in Denmark

Anne M. Clausen1, 2, Brian T. Kornblit3, Maria T. Larsen3, Duruta Weber1, Maja Pedersen3, Tine Rosenberg4, Andreas K. Pedersen5, 6, 7, Cristina Alfaro-Díaz8, 9 & Mary Jarden3, 10

20. maj 2026
13 min.

Abstract

Chronic graft-versus-host disease (cGVHD) affects 30-70% of allogeneic haematopoietic stem cell transplant (HSCT) recipients, with incidence varying by donor type, patient age and previous acute GVHD [1, 2]. This multisystem complication impairs mucocutaneous, ocular, pulmonary, gastrointestinal, hepatic and musculoskeletal functions, causing chronic immunologic dysfunction and increased morbidity [3]. Despite therapeutic advances, cGVHD remains a clinical challenge, significantly compromising patients’ functional status and quality of life [1, 4, 5].

cGVHD causes long-term impairments in physical, social and emotional functioning following transplantation [6]. Patients with active and moderate cGVHD report poorer well-being than those with mild or resolved cGVHD [7, 8].

To support disease monitoring, treatment evaluation and patient-centred care in cGVHD management, validated patient-reported outcome (PRO) measures are needed to systematically capture symptom burden and its effect on daily functioning.

This study aimed to translate and culturally adapt the Lee cGVHD Symptom Scale into Danish, evaluate its psychometric properties and examine its correlations with the quality of life instruments; the European Organization for Research and Treatment of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30) and the Patient-Reported Outcomes version of the Common Terminology of Cancer-related Adverse Events (PRO-CTCAE) in Danish clinical settings.

Methods

This cross-sectional study was conducted at two sites and involved two phases: 1) translation and cultural adaptation of the original English Lee cGVHD Symptom Scale into Danish, and 2) psychometric validation of the translated instrument.

The Lee cGVHD Symptom Scale [9, 10] is a validated questionnaire recommended by the 2014 National Institutes of Health cGVHD Consensus Response Criteria Working Group [11]. It systematically assesses symptom burden from the patient’s perspective, supporting disease monitoring, treatment evaluation and patient-centred care.

The questionnaire evaluates symptoms of multi-organ cGVHD manifestations through 30 items grouped into seven subscales: Skin, Eyes and mouth, Breathing, Eating and digestion, Muscles and joints, Energy, and Mental and emotional. Each item is rated on a five-point Likert scale (range: 0-4; from ‘not at all’ to ‘extremely’) with a seven-day recall period. Subscales are scored individually and can also be combined into a total score [12].

Study procedures

Translation and cross-cultural adaptation

The Danish version of the Lee cGVHD Symptom Scale was developed through a systematic multi-step process, following the Professional Society for Health Economics and Outcomes Research (ISPOR) recommendations for translation and cultural adaptation [13]. With permission from the scale’s developer, two Danish natives fluent in English independently translated the original scale, and HE and AMC reconciled the translations. A back-translation was then performed by a native English speaker who was unfamiliar with the original version.

Cognitive debriefing

Cognitive debriefing interviews were conducted with five Danish cGVHD patients to evaluate item clarity and relevance. Interviews were conducted individually by telephone, and minor linguistic adjustments were made based on patient feedback, before final approval by the scale’s developer. The Danish version of the scale (Table S1a), the cognitive debriefing interview guide (Table S1b) and item-level notes (Table S1c) are provided in the Supplementary material.

Psychometric testing

The psychometric properties of the Danish Lee cGVHD Symptom Scale were evaluated in accordance with the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) [14], with a focus on content validity, construct validity and internal consistency, as described in the Statistics section.

Participants and procedures

Participants were consecutively recruited from the outpatient haematology departments at Copenhagen University Hospital - Rigshospitalet, Denmark, and Odense University Hospital, Denmark. Eligible participants were adults (≥ 18 years) with a haematological cancer diagnosis, able to understand and read Danish, > 100 days post-transplantation, and diagnosed with cGVHD according to the 2014 National Institutes of Health (NIH) consensus criteria, defined as an NIH score > 1 in at least one affected organ.

Data collection

Data were collected at enrolment (T0) and 7-10 days later (T1). At T0, participants completed the Lee cGVHD Symptom Scale, the EORTC QLQ-C30, the PRO-CTCAE, and questions on education and employment. The EORTC QLQ-C30 and the PRO-CTCAE were used to assess correlations and validate the Danish Lee cGVHD symptom scale in a Danish clinical context. At T1, participants repeated the Lee cGVHD Symptom Scale. Physicians recorded clinical data at baseline (T0), including cancer diagnosis, transplantation date, graft type, GVHD prophylaxis, cGVHD severity and steroid treatment. Study data were collected and managed using Research Electronic Data Capture (REDCap) tools hosted at Odense University Hospital and University of Southern Denmark. REDCap is a secure, web-based software platform designed to support data capture for research studies [15].

Ethical considerations

This non-interventional study was exempt from review by the Research Ethics Committee. Approval was granted by the Danish Data Protection Agency (registration no. 24/17735). Written informed consent was obtained from all participants prior to inclusion.

Statistical analysis

Construct validity was examined using exploratory factor analysis (EFA) to evaluate structural validity, supported by Bartlett’s test of sphericity (p < 0.001) and the Kaiser-Meyer-Olkin measure [16]. Horn’s parallel analysis guided the determination of the number of factors. Hypothesis testing for construct validity involved comparison of scores between patients with clinician-rated mild versus severe cGVHD. Internal consistency was evaluated using Cronbach’s α for the total scale and each hypothesised subscale. Test-retest reliability was evaluated using Intraclass Correlation Coefficients (ICC) based on responses from participants who completed the scale twice within a 7-10-day interval; ICC values > 0.70 were considered acceptable [17]. Convergent and divergent validity were assessed using Spearman correlations between the Lee Symptom Scale, the EORTC QLQ-C30 and the PRO-CTCAE. Correlations > 0.70 were considered adequate for convergent validity, whereas low correlations were expected for divergent validity. We assumed that missing data were either missing at random or completely at random, and thus handled them implicitly in the EFA using maximum likelihood estimation [18]. Analyses were conducted using Stata 18.

Trial registration: not relevant.

Results

A total of 72 patients were included, of whom 65 completed the test-retest. Participant characteristics are shown in Table 1. EFA, guided by parallel analysis, supported a six-factor solution, with item loadings clustering into six subscales as shown in Table 2. However, the original seven-factor model was retained for comparability and clinical relevance.

The scale showed good overall internal consistency (Cronbach’s α = 0.862). Subscale reliability varied, with acceptable alphas except for Respiration (α = 0.490) (Figure 1). Test-retest reliability ranged widely (ICC: 0.23-0.93), with the highest stability in Eye-related and Psychological symptoms. The items Vomiting and Coloured sputum demonstrated a low test–retest reliability (ICC = 0.23 and 0.37, respectively; Supplementary Table S2). Minimal detectable change values were below 1 for all subscales.

Convergent validity between corresponding subscales of the EORTC QLQ-C30 and the Lee cGVHD Symptom Scale was generally consistent with expectations, except for a moderate correlation between Nausea and Eating (r = 0.492) (Table S3). Item-level analyses with the EORTC QLQ-C30 and the PRO-CTCAE confirmed these findings, although correlations for Vomiting and Dyspnoea were lower than expected (Table S4). Divergent validity, assessed through correlations between non-corresponding subscales, ranged from 0.094 to 0.501 (Table S5).

Symptom scores tended to increase with clinician-rated cGVHD severity, although total score differences were not statistically significant (p = 0.054). Significant differences were found in the Breathing (p = 0.001) and Muscle (p = 0.029) subscales, supporting partial construct validity (Table 3).

Discussion

Summary of findings

In this study, the Lee cGVHD Symptom Scale was translated and cross-culturally adapted into Danish, and its psychometric properties were evaluated in a cohort of Danish patients more than 100 days after HSCT with cGVHD and active symptoms.

The Danish version demonstrated good overall internal consistency, with acceptable consistency for most subscales except Respiration. The test-retest reliability was generally acceptable, although several symptoms (rashes, coloured sputum, vomiting and fevers) showed poor reproducibility. Convergent validity with the EORTC QLQ-C30 was largely consistent with expectations.

These findings provide the basis for a more detailed discussion of the scale’s structural validity and psychometric performance.

The EFA of the Danish version of the Lee cGVHD Symptom Scale revealed a six-subscale structure, which differs from the original seven subscales in the English version. It remains uncertain whether this variation is unique to the Danish version, as comparable analyses of the original English version and other translations are unavailable. Therefore, these findings should be interpreted with caution, given that the Kaiser-Meyer-Olkin measure for sampling adequacy was 0.644, indicating only a mediocre level of adequacy. This underscores the need for caution when considering a six-factor structure. Further research could provide additional insights into the subscale structure of the Danish version.

Low internal consistency in the Respiration subscale may reflect heterogeneity in symptom content, suggesting that separating items could improve reliability. Similarly, the low test-retest reliability for certain symptoms likely reflects day-to-day fluctuations rather than measurement errors. Recall period length may also influence symptom stability, with shorter intervals potentially enhancing precision [19].

Our findings are broadly consistent with previous validations of the Lee cGVHD Symptom Scale, including the original English version [10] and the Portuguese (Brazilian) version [20]. We demonstrated good overall internal consistency (α = 0.86), comparable to the English (range: 0.84-0.85) and Portuguese (range: 0.62-0.83) results [12, 20]. As reported previously, the Respiration subscale demonstrated low reliability (α = 0.49 in our translation, 0.40 in English, and 0.65 in Portuguese), indicating challenges in assessing this domain reliably across populations.

Our study differs from previous validations in several respects. First, we conducted an EFA, which, to our knowledge, has not been reported elsewhere. Second, we observed a lower test-retest reliability for certain items (e.g., Vomiting and Coloured sputum) than the higher stability reported in the English validation; however, this discrepancy warrants further investigation. Finally, although symptom scores generally increased with clinician-rated severity, the association did not reach statistical significance, which may reflect limitations in sample size or variability in symptom assessment methods.

Strengths and limitations

The strengths of this study include a rigorous translation process with independent forward and backward translations, recruitment from two hospitals to ensure a broader patient population, and the use of validated PRO instruments (the EORTC QLQ-C30 and the PRO-CTCAE) to strengthen validity assessment.

Conversely, the study also has limitations, including a relatively small sample size, a limited number of severe disease cases and a narrow age range of participants (60-71 years), which may limit generalisability to younger cGVHD patients. Although it met the lower threshold recommended by the COSMIN guidelines, this sample may have reduced statistical power.

Test-retest reliability was assessed only in clinically stable patients, as specified in the protocol. The absence of data from patients with changing clinical conditions limits the ability to evaluate the scale’s reliability in capturing symptom fluctuations. Future research should address this issue to enhance the tool’s applicability in dynamic clinical settings. Missing data were handled using maximum likelihood estimation, as they were assumed to be missing at random given the low proportion of missing data.

Implications for clinical practice 
Although some domains showed a low internal consistency or test–retest reliability, this may reflect true symptom variability rather than measurement error. Individual symptoms can still be clinically meaningful, and recall periods may influence reporting; shorter intervals may capture fluctuations more accurately in immunosuppressed populations. Tailoring recall periods and emphasising clinically relevant items may enhance the instrument’s utility and value. Overall, the instrument should be used to complement clinical decision-making, taking into consideration the patient’s context and other clinical assessments.

Conclusions

The Danish version of the Lee cGVHD Symptom Scale demonstrates reliability and validity for assessing cGVHD symptoms. While internal consistency and test-retest reliability are generally acceptable, certain subscales may benefit from further refinement. The scale is suitable for both clinical and research use in Danish cGVHD populations, though further validation in larger samples is recommended.

Correspondence Anne M. Clausen. E-mail: anne.moller.clausen@rsyd.dk

Accepted 7 April 2026

Published 20 May 2026

Conflicts of interest BTK reports financial support from or interest in the Joint Research Fund of Odense University Hospital and Rigshospitalet. AMC reports financial support from or interest in the Department of Clinical Research, University of Southern Denmark, the PhD Fund of Odense University Hospital, the PhD Fund of the Region of Southern Denmark, FADL’s Forlag and Munksgaard. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. These are available together with the article at ugeskriftet.dk/dmj

Use of AI The online version of Paperpal was used to enhance grammar and provide suggestions for language editing

References can be found with the article at ugeskriftet.dk/dmj

Cite this as Dan Med J 2026;73(6):A09250759

doi 10.61409/A09250759

Open Access under Creative Commons License CC BY-NC-ND 4.0

Supplementary material https://content.ugeskriftet.dk/sites/default/files/2026-04/a09250759_supplementary.pdf

Referencer

  1. Lee CJ, Wang T, Chen K, et al. Severity of chronic graft-versus-host disease and late effects following allogeneic hematopoietic cell transplantation for adults with hematologic malignancy. Transplant Cell Ther. 2024;30(1):97.e1-97.e14. https://doi.org/10.1016/j.jtct.2023.10.010
  2. DeFilipp Z, Alousi AM, Pidala JA, et al. Nonrelapse mortality among patients diagnosed with chronic GVHD: an updated analysis from the Chronic GVHD Consortium. Blood Adv. 2021;5(20):4278-4284. https://doi.org/10.1182/bloodadvances.2021004941
  3. Baumrin E, Loren AW, Falk SJ, et al. Chronic graft-versus-host disease. Part I: Epidemiology, pathogenesis, and clinical manifestations. J Am Acad Dermatol. 2024;90(1):1-16. https://doi.org/10.1016/j.jaad.2022.12.024
  4. El-Jawahri A, Pidala J, Khera N, et al. Impact of psychological distress on quality of life, functional status, and survival in patients with chronic graft-versus-host disease. Biol Blood Marrow Transplant. 2018;24(11):2285-2292. https://doi.org/10.1016/j.bbmt.2018.07.020
  5. Wenzel F, Pralong A, Scheid C, et al. Burden, resources, and needs of patients with severe graft-versus-host disease - a qualitative study. Palliat Support Care. 2025;23:e69. https://doi.org/10.1017/S147895152400172X
  6. Gruber I, Koelbl O, Herr W, et al. Impact of chronic graft-versus-host disease on quality of life and cognitive function of long-term transplant survivors after allogeneic hematopoietic stem cell transplantation with total body irradiation. Radiat Oncol. 2022;17(1):195. https://doi.org/10.1186/s13014-022-02161-9
  7. Hansen JL, Juckett MB, Foster MA, et al. Psychological and physical function in allogeneic hematopoietic cell transplant survivors with chronic graft-versus-host disease. J Cancer Surviv. 2023;17(3):646-656. https://doi.org/10.1007/s11764-023-01354-9
  8. Kurosawa S, Yamaguchi T, Oshima K, et al. Resolved versus active chronic graft-versus-host disease: impact on post-transplantation quality of life. Biol Blood Marrow Transplant. 2019;25(9):1851-1858. https://doi.org/10.1016/j.bbmt.2019.05.016
  9. Merkel EC, Mitchell SA, Lee SJ. Content validity of the Lee Chronic Graft-versus-Host Disease Symptom Scale as assessed by cognitive interviews. Biol Blood Marrow Transplant. 2016;22(4):752-758. https://doi.org/10.1016/j.bbmt.2015.12.026
  10. Teh C, Onstad L, Lee SJ. Reliability and validity of the modified 7-Day Lee Chronic Graft-versus-Host Disease Symptom Scale. Biol Blood Marrow Transplant. 2020;26(3):562-567. https://doi.org/10.1016/j.bbmt.2019.11.020
  11. Lee SJ, Wolff D, Kitko C, et al. Measuring therapeutic response in chronic graft-versus-host disease. National Institutes of Health Consensus Development Project on Criteria for Clinical Trials in Chronic Graft-versus-Host Disease: IV. The 2014 Response Criteria Working Group Report. Biol Blood Marrow Transplant. 2015;21(6):984-999. https://doi.org/10.1016/j.bbmt.2015.02.025
  12. Lee SK, Cook EF, Soiffer R, Antin JH. Development and validation of a scale to measure symptoms of chronic graft-versus-host disease. Biol Blood Marrow Transplant. 2002;8(8):444-452. https://doi.org/10.1053/bbmt.2002.v8.pm12234170
  13. Wild D, Grove A, Martin M, et al. Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: report of the ISPOR Task Force for Translation and Cultural Adaptation. Value Health. 2005;8(2):94-104. https://doi.org/10.1111/j.1524-4733.2005.04054.x
  14. Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539-549. https://doi.org/10.1007/s11136-010-9606-8
  15. Harris PA, Taylor R, Thielke R, et al. Research electronic data capture (REDCap) - a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377-381. https://doi.org/10.1016/j.jbi.2008.08.010
  16. Kaiser HF. An index of factorial simplicity. Psychometrika. 1974;39(1):31-36. https://doi.org/10.1007/BF02291575
  17. Snedecor GW, Cochran WG. Statistical methods. 6th ed. Ames: Iowa State University Press, 1973
  18. Nielsen LK, Mercieca-Bebber R, Möller S, et al. Relationship between reasons for intermittent missing patient-reported outcomes data and missing data mechanisms. Qual Life Res. 2024;33(9):2387-2400. https://doi.org/10.1007/s11136-024-03707-y
  19. Paudel R, Enzinger AC, Uno H, et al. Effects of a change in recall period on reporting severe symptoms: an analysis of a pragmatic multisite trial. J Natl Cancer Inst. 2024;116(7):1137-1144. https://doi.org/10.1093/jnci/djae049
  20. de Souza CV, Vigorito AC, Miranda ECM, et al. Translation, cross-cultural adaptation, and validation of the Lee Chronic Graft-versus-Host Disease Symptom Scale in a Brazilian population. Biol Blood Marrow Transplant. 2016;22(7):1313-1318. https://doi.org/10.1016/j.bbmt.2016.03.013