Abstract
INTRODUCTION. Chronic graft-versus-host disease (cGVHD) impairs quality of life after allogeneic haematopoietic stem cell transplantation. This study aimed to translate the Lee cGVHD Symptom Scale, a validated patient-reported outcome measure, into Danish and evaluate its psychometric properties.
METHODS. The scale was translated using forward–backward translation and cognitive debriefing interviews. Questionnaire data from the Danish version were collected at baseline and after 7-10 days. Construct validity was assessed with exploratory factor analysis (EFA), convergent and divergent validity by correlations with the European Organization for Research and Treatment of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30) and the Common Terminology of Cancer-related Adverse Events (PRO-CTCAE). Internal consistency was assessed using Cronbach’s alpha, and test-retest reliability using intraclass correlation coefficients (ICC).
RESULTS. A total of 72 patients participated; 65 completed the test-retest. EFA supported a six-factor structure, with some deviations from the original subscales. Cronbach’s alpha for the total scale was good (0.86). The test-retest ICC ranged from 0.23 to 0.93. Most hypothesised correlations for convergent and divergent validity were confirmed.
CONCLUSIONS. The Danish Lee cGVHD Symptom Scale is a valid and reliable instrument for assessing cGVHD symptoms. Overall consistency and reliability were acceptable, though refinement of some subscales is recommended. The scale is suitable for both clinical and research use in Danish cGVHD populations, with further validation suggested in larger samples.
FUNDING. The Joint Research Fund of Odense University Hospital and Rigshospitalet. Funding no.: 83-A3982.
TRIAL REGISTRATION. Not relevant.
Chronic graft-versus-host disease (cGVHD) affects 30-70% of allogeneic haematopoietic stem cell transplant (HSCT) recipients, with incidence varying by donor type, patient age and previous acute GVHD [1, 2]. This multisystem complication impairs mucocutaneous, ocular, pulmonary, gastrointestinal, hepatic and musculoskeletal functions, causing chronic immunologic dysfunction and increased morbidity [3]. Despite therapeutic advances, cGVHD remains a clinical challenge, significantly compromising patients’ functional status and quality of life [1, 4, 5].
cGVHD causes long-term impairments in physical, social and emotional functioning following transplantation [6]. Patients with active and moderate cGVHD report poorer well-being than those with mild or resolved cGVHD [7, 8].
To support disease monitoring, treatment evaluation and patient-centred care in cGVHD management, validated patient-reported outcome (PRO) measures are needed to systematically capture symptom burden and its effect on daily functioning.
This study aimed to translate and culturally adapt the Lee cGVHD Symptom Scale into Danish, evaluate its psychometric properties and examine its correlations with the quality of life instruments; the European Organization for Research and Treatment of Cancer Core Quality of Life Questionnaire (EORTC QLQ-C30) and the Patient-Reported Outcomes version of the Common Terminology of Cancer-related Adverse Events (PRO-CTCAE) in Danish clinical settings.
Methods
This cross-sectional study was conducted at two sites and involved two phases: 1) translation and cultural adaptation of the original English Lee cGVHD Symptom Scale into Danish, and 2) psychometric validation of the translated instrument.
The Lee cGVHD Symptom Scale [9, 10] is a validated questionnaire recommended by the 2014 National Institutes of Health cGVHD Consensus Response Criteria Working Group [11]. It systematically assesses symptom burden from the patient’s perspective, supporting disease monitoring, treatment evaluation and patient-centred care.
The questionnaire evaluates symptoms of multi-organ cGVHD manifestations through 30 items grouped into seven subscales: Skin, Eyes and mouth, Breathing, Eating and digestion, Muscles and joints, Energy, and Mental and emotional. Each item is rated on a five-point Likert scale (range: 0-4; from ‘not at all’ to ‘extremely’) with a seven-day recall period. Subscales are scored individually and can also be combined into a total score [12].
Study procedures
Translation and cross-cultural adaptation
The Danish version of the Lee cGVHD Symptom Scale was developed through a systematic multi-step process, following the Professional Society for Health Economics and Outcomes Research (ISPOR) recommendations for translation and cultural adaptation [13]. With permission from the scale’s developer, two Danish natives fluent in English independently translated the original scale, and HE and AMC reconciled the translations. A back-translation was then performed by a native English speaker who was unfamiliar with the original version.
Cognitive debriefing
Cognitive debriefing interviews were conducted with five Danish cGVHD patients to evaluate item clarity and relevance. Interviews were conducted individually by telephone, and minor linguistic adjustments were made based on patient feedback, before final approval by the scale’s developer. The Danish version of the scale (Table S1a), the cognitive debriefing interview guide (Table S1b) and item-level notes (Table S1c) are provided in the Supplementary material.
Psychometric testing
The psychometric properties of the Danish Lee cGVHD Symptom Scale were evaluated in accordance with the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) [14], with a focus on content validity, construct validity and internal consistency, as described in the Statistics section.
Participants and procedures
Participants were consecutively recruited from the outpatient haematology departments at Copenhagen University Hospital - Rigshospitalet, Denmark, and Odense University Hospital, Denmark. Eligible participants were adults (≥ 18 years) with a haematological cancer diagnosis, able to understand and read Danish, > 100 days post-transplantation, and diagnosed with cGVHD according to the 2014 National Institutes of Health (NIH) consensus criteria, defined as an NIH score > 1 in at least one affected organ.
Data collection
Data were collected at enrolment (T0) and 7-10 days later (T1). At T0, participants completed the Lee cGVHD Symptom Scale, the EORTC QLQ-C30, the PRO-CTCAE, and questions on education and employment. The EORTC QLQ-C30 and the PRO-CTCAE were used to assess correlations and validate the Danish Lee cGVHD symptom scale in a Danish clinical context. At T1, participants repeated the Lee cGVHD Symptom Scale. Physicians recorded clinical data at baseline (T0), including cancer diagnosis, transplantation date, graft type, GVHD prophylaxis, cGVHD severity and steroid treatment. Study data were collected and managed using Research Electronic Data Capture (REDCap) tools hosted at Odense University Hospital and University of Southern Denmark. REDCap is a secure, web-based software platform designed to support data capture for research studies [15].
Ethical considerations
This non-interventional study was exempt from review by the Research Ethics Committee. Approval was granted by the Danish Data Protection Agency (registration no. 24/17735). Written informed consent was obtained from all participants prior to inclusion.
Statistical analysis
Construct validity was examined using exploratory factor analysis (EFA) to evaluate structural validity, supported by Bartlett’s test of sphericity (p < 0.001) and the Kaiser-Meyer-Olkin measure [16]. Horn’s parallel analysis guided the determination of the number of factors. Hypothesis testing for construct validity involved comparison of scores between patients with clinician-rated mild versus severe cGVHD. Internal consistency was evaluated using Cronbach’s α for the total scale and each hypothesised subscale. Test-retest reliability was evaluated using Intraclass Correlation Coefficients (ICC) based on responses from participants who completed the scale twice within a 7-10-day interval; ICC values > 0.70 were considered acceptable [17]. Convergent and divergent validity were assessed using Spearman correlations between the Lee Symptom Scale, the EORTC QLQ-C30 and the PRO-CTCAE. Correlations > 0.70 were considered adequate for convergent validity, whereas low correlations were expected for divergent validity. We assumed that missing data were either missing at random or completely at random, and thus handled them implicitly in the EFA using maximum likelihood estimation [18]. Analyses were conducted using Stata 18.
Trial registration: not relevant.
Results
A total of 72 patients were included, of whom 65 completed the test-retest. Participant characteristics are shown in Table 1. EFA, guided by parallel analysis, supported a six-factor solution, with item loadings clustering into six subscales as shown in Table 2. However, the original seven-factor model was retained for comparability and clinical relevance.
The scale showed good overall internal consistency (Cronbach’s α = 0.862). Subscale reliability varied, with acceptable alphas except for Respiration (α = 0.490) (Figure 1). Test-retest reliability ranged widely (ICC: 0.23-0.93), with the highest stability in Eye-related and Psychological symptoms. The items Vomiting and Coloured sputum demonstrated a low test–retest reliability (ICC = 0.23 and 0.37, respectively; Supplementary Table S2). Minimal detectable change values were below 1 for all subscales.
Convergent validity between corresponding subscales of the EORTC QLQ-C30 and the Lee cGVHD Symptom Scale was generally consistent with expectations, except for a moderate correlation between Nausea and Eating (r = 0.492) (Table S3). Item-level analyses with the EORTC QLQ-C30 and the PRO-CTCAE confirmed these findings, although correlations for Vomiting and Dyspnoea were lower than expected (Table S4). Divergent validity, assessed through correlations between non-corresponding subscales, ranged from 0.094 to 0.501 (Table S5).
Symptom scores tended to increase with clinician-rated cGVHD severity, although total score differences were not statistically significant (p = 0.054). Significant differences were found in the Breathing (p = 0.001) and Muscle (p = 0.029) subscales, supporting partial construct validity (Table 3).
Discussion
Summary of findings
In this study, the Lee cGVHD Symptom Scale was translated and cross-culturally adapted into Danish, and its psychometric properties were evaluated in a cohort of Danish patients more than 100 days after HSCT with cGVHD and active symptoms.
The Danish version demonstrated good overall internal consistency, with acceptable consistency for most subscales except Respiration. The test-retest reliability was generally acceptable, although several symptoms (rashes, coloured sputum, vomiting and fevers) showed poor reproducibility. Convergent validity with the EORTC QLQ-C30 was largely consistent with expectations.
These findings provide the basis for a more detailed discussion of the scale’s structural validity and psychometric performance.
The EFA of the Danish version of the Lee cGVHD Symptom Scale revealed a six-subscale structure, which differs from the original seven subscales in the English version. It remains uncertain whether this variation is unique to the Danish version, as comparable analyses of the original English version and other translations are unavailable. Therefore, these findings should be interpreted with caution, given that the Kaiser-Meyer-Olkin measure for sampling adequacy was 0.644, indicating only a mediocre level of adequacy. This underscores the need for caution when considering a six-factor structure. Further research could provide additional insights into the subscale structure of the Danish version.
Low internal consistency in the Respiration subscale may reflect heterogeneity in symptom content, suggesting that separating items could improve reliability. Similarly, the low test-retest reliability for certain symptoms likely reflects day-to-day fluctuations rather than measurement errors. Recall period length may also influence symptom stability, with shorter intervals potentially enhancing precision [19].
Our findings are broadly consistent with previous validations of the Lee cGVHD Symptom Scale, including the original English version [10] and the Portuguese (Brazilian) version [20]. We demonstrated good overall internal consistency (α = 0.86), comparable to the English (range: 0.84-0.85) and Portuguese (range: 0.62-0.83) results [12, 20]. As reported previously, the Respiration subscale demonstrated low reliability (α = 0.49 in our translation, 0.40 in English, and 0.65 in Portuguese), indicating challenges in assessing this domain reliably across populations.
Our study differs from previous validations in several respects. First, we conducted an EFA, which, to our knowledge, has not been reported elsewhere. Second, we observed a lower test-retest reliability for certain items (e.g., Vomiting and Coloured sputum) than the higher stability reported in the English validation; however, this discrepancy warrants further investigation. Finally, although symptom scores generally increased with clinician-rated severity, the association did not reach statistical significance, which may reflect limitations in sample size or variability in symptom assessment methods.
Strengths and limitations
The strengths of this study include a rigorous translation process with independent forward and backward translations, recruitment from two hospitals to ensure a broader patient population, and the use of validated PRO instruments (the EORTC QLQ-C30 and the PRO-CTCAE) to strengthen validity assessment.
Conversely, the study also has limitations, including a relatively small sample size, a limited number of severe disease cases and a narrow age range of participants (60-71 years), which may limit generalisability to younger cGVHD patients. Although it met the lower threshold recommended by the COSMIN guidelines, this sample may have reduced statistical power.
Test-retest reliability was assessed only in clinically stable patients, as specified in the protocol. The absence of data from patients with changing clinical conditions limits the ability to evaluate the scale’s reliability in capturing symptom fluctuations. Future research should address this issue to enhance the tool’s applicability in dynamic clinical settings. Missing data were handled using maximum likelihood estimation, as they were assumed to be missing at random given the low proportion of missing data.
Implications for clinical practice
Although some domains showed a low internal consistency or test–retest reliability, this may reflect true symptom variability rather than measurement error. Individual symptoms can still be clinically meaningful, and recall periods may influence reporting; shorter intervals may capture fluctuations more accurately in immunosuppressed populations. Tailoring recall periods and emphasising clinically relevant items may enhance the instrument’s utility and value. Overall, the instrument should be used to complement clinical decision-making, taking into consideration the patient’s context and other clinical assessments.
Conclusions
The Danish version of the Lee cGVHD Symptom Scale demonstrates reliability and validity for assessing cGVHD symptoms. While internal consistency and test-retest reliability are generally acceptable, certain subscales may benefit from further refinement. The scale is suitable for both clinical and research use in Danish cGVHD populations, though further validation in larger samples is recommended.
Correspondence Anne M. Clausen. E-mail: anne.moller.clausen@rsyd.dk
Accepted 7 April 2026
Published 20 May 2026
Conflicts of interest BTK reports financial support from or interest in the Joint Research Fund of Odense University Hospital and Rigshospitalet. AMC reports financial support from or interest in the Department of Clinical Research, University of Southern Denmark, the PhD Fund of Odense University Hospital, the PhD Fund of the Region of Southern Denmark, FADL’s Forlag and Munksgaard. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. These are available together with the article at ugeskriftet.dk/dmj
Use of AI The online version of Paperpal was used to enhance grammar and provide suggestions for language editing
References can be found with the article at ugeskriftet.dk/dmj
Cite this as Dan Med J 2026;73(6):A09250759
doi 10.61409/A09250759
Open Access under Creative Commons License CC BY-NC-ND 4.0
Supplementary material https://content.ugeskriftet.dk/sites/default/files/2026-04/a09250759_supplementary.pdf
Referencer
- Lee CJ, Wang T, Chen K, et al. Severity of chronic graft-versus-host disease and late effects following allogeneic hematopoietic cell transplantation for adults with hematologic malignancy. Transplant Cell Ther. 2024;30(1):97.e1-97.e14. https://doi.org/10.1016/j.jtct.2023.10.010
- DeFilipp Z, Alousi AM, Pidala JA, et al. Nonrelapse mortality among patients diagnosed with chronic GVHD: an updated analysis from the Chronic GVHD Consortium. Blood Adv. 2021;5(20):4278-4284. https://doi.org/10.1182/bloodadvances.2021004941
- Baumrin E, Loren AW, Falk SJ, et al. Chronic graft-versus-host disease. Part I: Epidemiology, pathogenesis, and clinical manifestations. J Am Acad Dermatol. 2024;90(1):1-16. https://doi.org/10.1016/j.jaad.2022.12.024
- El-Jawahri A, Pidala J, Khera N, et al. Impact of psychological distress on quality of life, functional status, and survival in patients with chronic graft-versus-host disease. Biol Blood Marrow Transplant. 2018;24(11):2285-2292. https://doi.org/10.1016/j.bbmt.2018.07.020
- Wenzel F, Pralong A, Scheid C, et al. Burden, resources, and needs of patients with severe graft-versus-host disease - a qualitative study. Palliat Support Care. 2025;23:e69. https://doi.org/10.1017/S147895152400172X
- Gruber I, Koelbl O, Herr W, et al. Impact of chronic graft-versus-host disease on quality of life and cognitive function of long-term transplant survivors after allogeneic hematopoietic stem cell transplantation with total body irradiation. Radiat Oncol. 2022;17(1):195. https://doi.org/10.1186/s13014-022-02161-9
- Hansen JL, Juckett MB, Foster MA, et al. Psychological and physical function in allogeneic hematopoietic cell transplant survivors with chronic graft-versus-host disease. J Cancer Surviv. 2023;17(3):646-656. https://doi.org/10.1007/s11764-023-01354-9
- Kurosawa S, Yamaguchi T, Oshima K, et al. Resolved versus active chronic graft-versus-host disease: impact on post-transplantation quality of life. Biol Blood Marrow Transplant. 2019;25(9):1851-1858. https://doi.org/10.1016/j.bbmt.2019.05.016
- Merkel EC, Mitchell SA, Lee SJ. Content validity of the Lee Chronic Graft-versus-Host Disease Symptom Scale as assessed by cognitive interviews. Biol Blood Marrow Transplant. 2016;22(4):752-758. https://doi.org/10.1016/j.bbmt.2015.12.026
- Teh C, Onstad L, Lee SJ. Reliability and validity of the modified 7-Day Lee Chronic Graft-versus-Host Disease Symptom Scale. Biol Blood Marrow Transplant. 2020;26(3):562-567. https://doi.org/10.1016/j.bbmt.2019.11.020
- Lee SJ, Wolff D, Kitko C, et al. Measuring therapeutic response in chronic graft-versus-host disease. National Institutes of Health Consensus Development Project on Criteria for Clinical Trials in Chronic Graft-versus-Host Disease: IV. The 2014 Response Criteria Working Group Report. Biol Blood Marrow Transplant. 2015;21(6):984-999. https://doi.org/10.1016/j.bbmt.2015.02.025
- Lee SK, Cook EF, Soiffer R, Antin JH. Development and validation of a scale to measure symptoms of chronic graft-versus-host disease. Biol Blood Marrow Transplant. 2002;8(8):444-452. https://doi.org/10.1053/bbmt.2002.v8.pm12234170
- Wild D, Grove A, Martin M, et al. Principles of good practice for the translation and cultural adaptation process for patient-reported outcomes (PRO) measures: report of the ISPOR Task Force for Translation and Cultural Adaptation. Value Health. 2005;8(2):94-104. https://doi.org/10.1111/j.1524-4733.2005.04054.x
- Mokkink LB, Terwee CB, Patrick DL, et al. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19(4):539-549. https://doi.org/10.1007/s11136-010-9606-8
- Harris PA, Taylor R, Thielke R, et al. Research electronic data capture (REDCap) - a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377-381. https://doi.org/10.1016/j.jbi.2008.08.010
- Kaiser HF. An index of factorial simplicity. Psychometrika. 1974;39(1):31-36. https://doi.org/10.1007/BF02291575
- Snedecor GW, Cochran WG. Statistical methods. 6th ed. Ames: Iowa State University Press, 1973
- Nielsen LK, Mercieca-Bebber R, Möller S, et al. Relationship between reasons for intermittent missing patient-reported outcomes data and missing data mechanisms. Qual Life Res. 2024;33(9):2387-2400. https://doi.org/10.1007/s11136-024-03707-y
- Paudel R, Enzinger AC, Uno H, et al. Effects of a change in recall period on reporting severe symptoms: an analysis of a pragmatic multisite trial. J Natl Cancer Inst. 2024;116(7):1137-1144. https://doi.org/10.1093/jnci/djae049
- de Souza CV, Vigorito AC, Miranda ECM, et al. Translation, cross-cultural adaptation, and validation of the Lee Chronic Graft-versus-Host Disease Symptom Scale in a Brazilian population. Biol Blood Marrow Transplant. 2016;22(7):1313-1318. https://doi.org/10.1016/j.bbmt.2016.03.013