Original Article

The Danish version of the Western Ontario Rotator Cuff Index

Lone D. Brix¹, Karen T. Bjørnholdt², Lone Nikolajsen³, Kirsten Kallestrup¹ & Theis M. Thillemann⁴

1. feb. 2020

17 min.

FebruaryThe Western Ontario Rotator Cuff Index (WORC) is a self-reported instrument to assess the health-related quality of life of patients with shoulder complaints as a consequence of rotator cuff disease [1]. WORC assesses five domains: physical symptoms, sports/recreation, work, lifestyle and emotions and is completed by the patient without interpretation of the response by a clinician [1]. Hence, it is the patient’s assessment of the outcome of a treatment, which reduces the risk of observer bias.

Fakta

To our knowledge, the WORC is the only patient-reported outcome measure that measures outcomes of condition-specific rotator cuff disorders including impingement, and it has been translated and validated into a number of languages [2-10].Moreover, the psychometric properties of the WORC have been tested and have shown good validity, reliability and responsiveness [1, 11, 12]. Previous adaptions have found moderate to high correlations between the WORC and the Oxford Shoulder Score (OSS) and the WORC and the Disabilities of Arm, Shoulder, and Hand (DASH) questionnaire [1, 6, 7, 9, 10, 12]. However, the validity, reliability, and responsiveness of the WORC have not yet been investigated in a Danish-speaking population.

The aim of this study was therefore to translate and adapt the WORC into a Danish version (D-WORC) and to evaluate the validity, reliability and responsiveness of the D-WORC in a cohort of native Danish-speaking patients undergoing surgery for arthroscopic subacromial decompression (ASD) or rotator cuff repair (RCR). We hypothesised that the D-WORC was positively correlated with other shoulder-specific questionnaires (OSS and DASH), but negatively correlated with a generic questionnaire (Medical Outcomes Study Short-Form 36 (SF-36)).

Methods

Translation and cross-cultural adaption

The principles of translation and cross-cultural adaption were applied to the original version of the WORC according to recommendations and guidelines proposed by Wild et al. and as described in the Dutch adaption [13, 14]. Written consent to translate and apply the questionnaire was obtained from the original developers.

The version that was translated into Danish underwent field-testing during face-to-face interviews with ten patients who had undergone the ASD and ten patients who had undergone the RCR to establish its face validity. None of the 20 participants interviewed reported irregularities in the questions or difficulty in understanding the questions. This version was defined as the final version of the D-WORC used in the subsequent validation phase without further change. The final version of the D-WORC was accepted by the original developers [1] and subjected to further psychometric testing.

Psychometric testing

Patients

The study was a prospective, observational cohort study with a three-month follow-up period on patients with rotator cuff disorders. Patients were recruited at our institutions from December 2015 to April 2017.
After obtaining informed written consent, 126 patients scheduled for outpatient arthroscopic shoulder surgery were enrolled at the Day Surgery Unit at Horsens Regional Hospital, Denmark.

Patients were included in the study if they met the following inclusion criteria: diagnosed subacromial disease such as impingement, biceps tendinitis and/or rotator cuff tears; were candidates for surgical treatment; were able to communicate in Danish and would give their informed consent for participation. The exclusion criteria were age < 18 years or psychiatric illness. The study was approved by the Danish Data Protection Agency (1-16-02-653-15).

Questionnaires

Patients were assessed at three different time points: preoperatively (T0), three days after the preoperative consultation (T1) and three months after surgery (T2). At T0, the following questionnaires were administered: the Danish version of the WORC (D-WORC), the OSS, the DASH and the SF-36. Additionally, questions regarding baseline characteristics including gender, age, side of affected shoulder, pain intensity, level of education, work status, expectations to resume daily living/work, activity level, and the Single Assessment Numeric Evaluation (SANE) were completed by the patients. At T1, only the D-WORC was administered. At T2, the administered questionnaires included the D-WORC, the OSS, the DASH, the SF-36, pain intensities, the SANE and a global rating scale (GRS). The GRS is a seven-point Likert scale ranging from 0 (much better/much improved) to 6 (much worse/much deteriorated).

Statistical analysis

Patients were analysed both as a total group and in two sub-groups: group ASD (subacromial disease such as impingement, biceps tendinitis) and group RCR (rotator cuff tears). According to quality criteria for measurement properties of health status questionnaires, a sample size of at least 50 in each subgroup was required [15]. In case of one missing value in a domain, the domain score was calculated using the average of the other items in the domain. If more than two items were missing in a domain, the WORC questionnaire was excluded from analyses [1].

All statistical analyses were conducted with STATA software version 15.0 with the alpha level set at 0.05. Results are presented as either mean ± standard deviation (SD) (parametric data) or as frequencies or medians with interquartile range (IQR) (non-parametric). The co-variance of the instruments was calculated using Pearson’s correlation coefficient (PCC) for parametric data and Spearman’s correlation coefficient (SCC) for non-parametric data. Correlations were categorised as high if > 0.70, moderate if between 0.5 and 0.70, and low if < 0.50.

Based on hypothesised correlations, convergent validity was determined by estimating either the SCC or the PCC between the total score of the D-WORC at T0 and the scores of the OSS, the DASH and the SF-36 at T0.We hypothesised that the D-WORC would be positively correlated with the OSS (0.70) and the DASH (0.70) and negatively correlated with the SF-36 physical sum score (–0.5) and the SF-36 mental sum score (–0.5) at T0.

The item response rate, ceiling/flooring effects and patient feedback were the three indexes for comprehensiveness assessment [15]. The response rate was considered good if it was > 95% for each item in the scale. Floor and ceiling effects were considered in each subscale if > 15% of the patients achieved the lowest possible score (floor effect) or the highest possible score (ceiling effect).

The internal consistency of the D-WORC at T0 was measured using Chronbach’s alpha (α) and was considered to be excellent if α > 0.9, good if α > 0.8 and acceptable if α > 0.7. The test-retest reliability was assessed between T0 and T1 for the D-WORC using the interclass correlation coefficient (ICC): a two-way analysis of variance in a random effect model with associated 95% confidence interval (95% CI) [16]. The ICC values were considered excellent (ICC ≥ 0.8), good (ICC: 0.61-0.80), moderate (ICC: 0.41-0.60), fair (ICC: 0.21-0.40) or poor (ICC ≤ 0.21).

Our hypothesis was that the D-WORC would be highly reliable (ICC ≥ 0.8), like in other adaption studies [2-5]. Reliability was visualised using a Bland-Altman plot with limits of agreement (LOA) as the mean difference ± 1.96 times its standard deviation.

To assess criterion responsiveness, we chose to use the GRS as the gold standard for measuring change over time because of its high face validity [17]. At T2, patients were asked to rate the effect of the surgery on a GRS, which is a 7-point Likert scale ranging from 0 (much better/much improved) to 6 (much worse/much deteriorated) [17].

Since the assessment of construct responsiveness relies on the tested hypotheses, several a priori hypotheses were constructed [17]. Our hypothesis was that the correlation between the change in the D-WORC from T0 to T2 and the GRS was high in all patients and in the two subgroups: group ASD and group RCR. We also hypothesised that the correlations between the change in the OSS and the DASH from T0 to T2 and the GRS were moderate in all patients, group ASD, and group RCR. Finally, we hypothesised that the D-WORC would have a higher area under the curve (AUC) than the OSS and the DASH.

Trial registration: Danish Data Protection Agency: 1-16-02-653-15.

Results

A total of 126 patients were enrolled in the study. Seven had their surgery cancelled on the patient’s request, leaving 119 patients at T0 of whom 109 (91.6%) completed the questionnaire at T0, 96 at T1 (80.7%), and 80 at T2 (67.2%). The item response rate varied from 97.5% to 100% at T0, from 97.5 to 100% at T1 and from 96.3 to 100% at T2. Data were available from 113 patients for evaluation of validity, 95 for evaluation of reliability, 75 for construct responsiveness and 71 for criterion responsiveness. The baseline characteristics are shown in Table 1. No significant differences were found between the non-responders and responders regarding baseline characteristics.

Table 2 provides an overview of the correlations between the WORC and the DASH, the OSS and the SF-36. The correlation was high between the D-WORC and the DASH (PCC = 0.71; 95% CI: 0.60-0.79) and moderate between the D-WORC and the OSS (PCC = 0.67; 95% CI: 0.55-0.76). The correlation was low between the D-WORC and the physical sum score of the SF-36 (PCC = –0.39; 95% CI: –0.54-–0.22) as well as between the D-WORC and the mental sum score of the SF-36 (PCC =–0.39; 95% CI: –0.54-–0.21).

Only one patient had the highest possible score in three subscales of the D-WORC (work, sports and emotions) at T2. None of the participants had the lowest or the highest possible score in any of the subscales at T0, or in the total score of the D-WORC at T0 or T2.

Table 3 provides an overview of the reliability of the D-WORC. The test-retest reliability of the D-WORC was found to be good (ICC = 0.80; 95% CI: 0.69-0.87). The single-item ICCs ranged from 0.60 to 0.82 (moderate to excellent). The Bland-Altman plot revealed a test-retest mean difference of 76.4 ± SD 201.4) with an LOA_lowerof –318.3 (95% CI: –387.8-–248.9) and an LOA_upper of 471.2 (95% CI: 401.7-540.6) for the D-WORC.

The internal consistency of the D-WORC was excellent for both groups (α = 0.94; 95% CI :0.92-0.95), group ASD (α = 0.94; 95% CI: 0.92-0.96) and group RCR (α = 0.93; 95% CI: 0.89-0.95).

Table 4 offers an overview of the construct and criterion responsiveness with low to high correlations. The D-WORC (AUC = 0.88; 95% CI: 0.78-0.97) had a higher AUC than the OSS (AUC = 0.78; 95% CI: 0.66-0.90) and the DASH (AUC = 0.74; 95% CI: 0.61-0.86).

Discussion

The present study showed that the Danish adaption of the WORC, i.e. the D-WORC, is reliable and responsive for assessing individuals with rotator cuff disorders treated with ASD and/or RCR. The correlations between the D-WORC and the DASH are in agreement with other adaptations (–0.65 and –0.86) [6, 7, 12]. This was expected since the D-WORC and the DASH have many similar items, e.g. the number of symptoms the patient has experienced in the past week as related to the problematic shoulder. Compared to other adaptations (0.69-0.84), the correlation between the D-WORC and the OSS was slightly lower than expected [2, 5, 8]. This was also the case with the SF-36 physical sum score (0.52-0.65) [7, 9, 10]. However, the SF-36 mental sum score was in line with the SF-36 mental sum score found in the Brazilian adaption (0.30) [9]. As predicted, we only found a low to modest correlation between the SF-36 and the disease-specific WORC because a global health status tool like, e.g., the SF-36 is likely to be insensitive to changes in one joint. It is important to bear in mind the known cultural differences of, e.g., the SF-36 between the different adaptions; hence, the impact of this should be investigated in future studies [18, 19].

The response rate was lower than hypothesised and did not reach the recommended 80% at T1 and T2, and selection bias cannot be ruled out [20]. However, several of the patients were excluded from the analyses due to cancelled operations and not because they were lost to follow-up. Since a high item response rate and no flooring or ceiling effects were observed, we conclude that the D-WORC shows good comprehensiveness and is suitable for the study population.

The test-retest reliability was found to be good. Since a heterogeneous group of patients was investigated, it is important to keep in mind that the ICC is highly dependent on the variation of the study sample and is only generalisable to samples with a similar variation. The recommended ICC for an assessment tool is ICC < 0.70 for a large group (as in research) or ICC < 0.90 for individuals [9, 21]. Hence, present results indicate that the D-WORC has sufficient reliability for use in a large group but may not be suitable for individual patient assessment.

The relatively high values for LOA of the five domains indicate a rather large difference between measurement error and real change over time. However, the LOAs of the D-WORC score are comparable to the LOAs found in other adaptions, which may be due to the heterogeneous study populations [5, 6]. As seen in our and in the Swedish adaption, the highest LOA was found in the sports domain because some participants never do push-ups or carry out throwing actions. This could have contributed to the lower internal consistency of the sports domain, but this did not affect the reliability [2].

The overall scale of the D-WORC and its subscales have a good responsiveness, suggesting that they can detect changes in the functional status of patients who have undergone ASD and RCR with good sensitivity. We found that the D-WORC correlates better with change scores and has a higher AUC than the OSS and the DASH. Ekeberg et al. also found that the WORC had a higher AUC than the OSS and the Shoulder Pain and Disability Index (SPADI) in patients with rotator cuff disease receiving corticosteroid injection therapy.
A possible explanation for this might be that the D-WORC is more disease-specific, i.e. related to rotator cuff disorders, than the OSS and DASH.

The present study has a few shortcomings. Firstly, the overall sample size was more than sufficient, but when studying the two subgroups, group ASD and group RCR, less than the required sample size of 50 was reached at T1 and T2 in group RCR. Secondly, a factor analysis is commonly done before Chronbach’s α is calculated. However, the sample size in the present study was too limited to perform a factor analysis. For the same reason, we chose not to perform a confirmatory and a Rash analysis. Finally, it can be argued that the GRS could be influenced by recall bias, since patients were asked to compare their shoulder disorder three months after surgery with their shoulder disorder preoperatively. We recommend that future studies with larger sample sizes be undertaken to perform confirmatory and Rash analyses.

Conclusions

The WORC was successfully translated and adapted into a Danish version. Despite the relatively high LOA values, the D-WORC seems to be a reliable measurement tool for assessing health-related quality of life in patients undergoing ASD and/or RCR in the Danish population. The authors suggest that the D-WORC can be used in the functional status evaluation of large groups of patients undergoing ASD and/or RCR, since the D-WORC may not be suitable for individual patient assessment. The cross-cultural validity, including confirmatory analysis and Rash analysis, needs further investigation.

Correspondence: Lone D. Brix. E-mail: lonebrix@rm.dk

Accepted: 2 December 2019

Conflicts of interest: none. Disclosure forms provided by the authors are available with the full text of this article at Ugeskriftet.dk/dmj

Referencer

LITERATURe

Kirkley A, Alvarez C, Griffin S. The development and evaluation of a disease-specific quality-of-life questionnaire for disorders of the rotator cuff: The Western Ontario Rotator Cuff Index. Clin J Sport Med 2003;13:84-92.
Zhaeentan S, Legeby M, Ahlstrom S et al. A validation of the Swedish version of the WORC index in the assessment of patients treated by surgery for subacromial disease including rotator cuff syndrome. BMC Musculoskelet Disord 2016;17:165.
Wessel RN, Lim TE, van Mameren H et al. Validation of the Western Ontario Rotator Cuff index in patients with arthroscopic rotator cuff repair: a study protocol. BMC Musculoskelet Disord 2011;12:64.
Wiertsema SH. Reproducibility of the Dutch version of the Western Ontario rotator cuff Index. J Shoulder Elbow Surg 2013;22:165-70.
Ekeberg OM, Bautz-Holter E, Tveita EK et al. Agreement, reliability and validity in 3 shoulder questionnaires in patients with rotator cuff disease. BMC Musculoskelet Disord 2008;9:68.
Kawabata M, Miyata T, Nakai D et al. Reproducibility and validity of the Japanese version of the Western Ontario Rotator Cuff Index. J Orthop Sci 2013;18:705-11.
Mousavi SJ, Hadian MR, Abedi M et al. Translation and validation study of the Persian version of the Western Ontario Rotator Cuff Index. Clin Rheumatol 2009;28:293-9.
Wang W, Xie QY, Jia ZY et al. Cross-cultural translation of the Western Ontario Cuff Index in Chinese and its validation in patients with rotator cuff disorders. BMC Musculoskelet Disord 2017;18:178.
Lopes AD, Ciconelli RM, Carrera EF et al. Validity and reliability of the Western Ontario Rotator Cuff Index (WORC) for use in Brazil. Clin J Sport Med 2008;18:266-72.
St-Pierre C, Dionne CE, Desmeules F et al. Reliability, validity, and responsiveness of a Canadian French adaptation of the Western Ontario Rotator Cuff (WORC) index. J Hand Ther 2015;28:292-9.
St-Pierre C, Desmeules F, Dionne CE et al. Psychometric properties of self-reported questionnaires for the evaluation of symptoms and functional limitations in individuals with rotator cuff disorders: a systematic review. Disabil Rehabil 2016;38:103-22.
de Witte PB, Henseler JF, Nagels J et al. The Western Ontario rotator cuff index in rotator cuff disease patients: a comprehensive reliability and responsiveness validation study. Am J Sports Med 2012;40:
1611-9.
Wild D, Grove A, Martin M et al. Principles of Good Practice for the Translation and Cultural Adaptation Process for Patient-Reported Outcomes (PRO) Measures: report of the ISPOR Task Force for Translation and Cultural Adaptation. Value Health 2005;8:94-104.
Wessel RN, Wolterbeek N, Fermont AJ et al. The conceptually equivalent Dutch version of the Western Ontario Rotator Cuff Index (WORC)(c). BMC Musculoskelet Disord 2013;14:362.
Terwee CB, Bot SD, de Boer MR et al. Quality criteria were proposed for measurement properties of health status questionnaires. J Clin Epidemiol 2007;60:34-42.
Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res 2005;19:231-40.
de Vet HC, Terwee CB, Mokkink LB-* et al. Measurement in medicine -
a practical guide. UK: Cambridge University Press, 2011.
Health measurement scales: a practical guide to their development and use (5th edition). Aust N Z J Public Health 2016;40:294-5.
Bjørner JB, Damsgaard MT, Watt T et al. Dansk manual til SF-36 Et spørgeskema om helbredsstatus. København: Lif Lægemiddelindustriforeningen, 1997.
Fincham JE. Response rates and responsiveness for surveys, standards, and the Journal. Am J Pharm Educ 2008;72:43.
Kirkley A, Griffin S, Dainty K. Scoring systems for the functional assessment of the shoulder. Arthroscopy 2003;19:1109-20.