Skip to main content

Validation of the Danish version of the Oxford Elbow Score

Hans Christian Plaschke, Andreas Jørgensen, Theis Muncholm Thillemann, Stig Brorson & Bo Sanderhoff Olsen

1. okt. 2013
13 min.

Faktaboks

Fakta

Patient-reported outcome measures (PROMs) quantify the patients‘ own experience in relation to a health condition and its therapy [1]. PROMs are useful for the evaluation of surgical interventions because they express how the patient feels about a certain treatment. Since PROMs are independent of the surgical team, they are less biased than standard clinical assessments [2].

Many scoring systems have been used in elbow disorders. However, only few of these have been validated and many assess only few aspects of elbow function [3]. The Oxford Elbow Score (OES) is a 12-item patient-administrered questionnaire that measures the quality of life of patients with disorders of the elbow joint (Table 1). The OES comprises three domains: function, pain and social-psychological status. Each domain is covered by four questions whose answers are recorded on a five-point Likert scale. Each item within a domain contributes equally to the total score of the given domain. Each domain is transformed into a 100-point metric scale with 0 being the worst and 100 being the best score [4].

The Disability of the Arm, Shoulder and Hand questionnaire (DASH) is a 30-item questionnaire for measuring the symptoms and function in patients with musculoskeletal disorders of the upper limb. The DASH is measured on a 100-point scale with 100 being the worst and 0 being the best score [5, 6]. The Danish version of the DASH has previously been validated [7]. The DASH is not specific for the elbow.

The Mayo Elbow Performance Score (MEPS) is a physician-administered tool that covers four domains: pain (45 points), ulno-humeral movement (twenty 20 points), stability (ten points) and the patient‘s ability to accomplish five everyday functional tasks (25 points) [8].

The OES has been found to be reliable and valid in previous studies from England and the Netherlands [4, 9, 10]. In a study by Dawson et al [11], the OES was also found to have good responsiveness and ability to detect changes six months after surgery. The purpose of this study was to translate the OES into a Danish version (D-OES) for publication, to validate the Danish translation and to retest the psychometric properties of the D-OES within a Danish setting for patients with TEA.

MATERIAL AND METHODS

A total of 150 patients who had undergone TEA surgery were identified. In all, 130 patients (87%) were included at Herlev Hospital from October 2010 to October 2011. The study group comprised 156 TEAs inserted in Eastern Denmark from 1981 to 2008. Both primary and revision TEAs were included. The average age was 68 years (range 41-91 years).

To test the convergent validity, we compared the D-OES to the MEPS, which has been a standard tool for measuring outcome after total elbow arthroplasty. We further compared the D-OES to the Danish validated DASH score. Patients were evaluated with the MEPS within a clinical setting, whereas the patients had filled out the DASH and the D-OES at home. The OES was translated into Danish according to guidelines provided by Beaton et al [12]. Two bilingual persons who were Danish native speakers translated the OES into Danish. One of the translators had clinical experience and one did not. Two independent bilingual native English speakers then back-translated the Danish version into English. This back translation (Figure 1) was then compared to the original OES, evaluated for mistakes and finally accepted by the authors of the original OES.

The median follow-up time from surgery to evaluation was 6.7 years (range 2-20 years). The patients were all considered to be in a stable state concerning the affected elbow.

A total of 97% of the D-OES questionnaires and 92% of the DASH questionnaires were completed correctly. All the patients were evaluated with the MEPS. In all, 50 patients were asked to fill in the D-OES two times at different occasions at a 14-day interval. In all, 45 patients (90%) completed both of the D-OES questionnaires.

Statistics

The psychometric properties of the Danish version were tested in terms of reliability and validity.

Test-retest reliability is a measure of the consistency of a psychological test. The test-retest reliability is expressed as the intraclass correlation coefficient. This type of reliability assumes that the condition is stable. No treatment was given between the two evaluations. Cronbach’s alpha was calculated as a measure of internal consistency and the Pearson correlation coefficient was calculated between the D-OES, the MEPS and the DASH. SPSS Statistics version 20.0 was used to calculate reliability, internal consistency and correlation.

Rasch analysis was performed using a rating scale model with Winsteps Rasch Measurement Version 3.75.1. The following analyses were performed: Construction of the person and item map (Wright map), testing of the fit between the data and the model, estimation of the person and item reliability and separation coefficient, testing of the ordering of the categories, and analysis of the dimensionality.

Trial registration: not relevant.

RESULTS

The internal consistency calculated as Cronbach’s alpha was 0.998 (95% confidence interval (CI): 0.997-0.999). Expressed by the Pearson’s correlation coefficient, the convergent validity of the D-OES functional, social-psychological and pain domains were 0.78, 0.80 and 0.81, respectively, for the MEPS and –0.66, –0.58 and –0.49, respectively, for the DASH. Between the MEPS and the DASH, the correlation coefficient was –0.572.

The intraclass correlation coefficient was 0.998, 0.996 and 0.996 for the functional, social-psychological and pain domains, respectively.

The Wright map values for items and patients were in order based on the logit scale and the default mean difficulty was set to zero. The majority of the patients were located opposite and above the items which indicates an overall good D-OES score. The mean person estimate was 2.2 (95% CI: 1.8-2.6).

In Table 2, the items are placed according to item difficulty with the most difficult item at the top. Chi-square fit statistics were calculated to determine how well the data fit the Rasch model. The in-fit mean square (MNSQ) represents the information-weighted mean square residuals difference between observed and expected responses. These statistics are sensitive to unexpected responses near the person’s ability level. The outfit MNSQ represents the usual unweighted mean square residual and is more sensitive to outliers than the infit MNSQ. Values should range between 0.6 and 1.4 for rating scales or 0.5 to 1.7 for clinical observations [13]. All the items in the OES had infit and outfit MNSQ values between 0.74 and 1.36.

The item reliability coefficient was 0.97, and the item separation coefficient was 5.37. The person reliability index was 0.90 and the person separation index was 3.04. The person separation index was used to calculate the number of distinct levels of quality of life (strata) that could be distinguished (strata = (4 × person separation index + 1)/3) = 4.4.

The five categories of the D-OES were represented with increasing count (Table 3). The observed average is the average of the (person measures – item difficulties) modelled to produce the responses observed in the category. Outfit MNSQ is the average of the outfit MNSQs associated with the responses in each category. The expected values for all categories are 1.0. This statistic is sensitive to grossly unexpected responses. Only values greater than 1.5 are problematic. The Rasch Andrich thresholds are the calibrated measures of the transitions between adjacent categories. They are the points on a Likert scale where probability of being observed in either of two adjacent categories is equal. The Rasch Andrich threshold is expected to increase monotonically on a rating scale when categories are ordered.

A Rasch principal component analysis (PCA) of the residuals of the OES was performed. The raw variance of the OES explained by the Rasch measure was 73.3% (73.1% was expected by the model). The unexplained variance in the first contrast was 9.3% (4.2 eigenvalue units), and the second contrast was 5.5% (2.5 eigenvalue units). The first contrast consisted of the four pain items. The second contrast consisted of the four social psychological items. When data fit the Rasch model, the Rasch dimension is the only dimension. Any other dimension in the data must explain at least two items (2.0 eigenvalue units) worth of variance.

DISCUSSION

We found a high internal consistency and a high test-retest reliability for all three domains of the D-OES. In the Dutch validation study [9], the investigators also found a high test-retest reliability for the function (0.87), the pain (0.89) and the social social-psychological domain (0.87). Dawson et al [11] found equally high reliabilities of 0.89, 0.98 and 0.87, respectively.

The D-OES corresponds well with the MEPS in all three domains, whereas the DASH had less correspondence with both the MEPS and the three domains of the D-OES. The majority of our patients were operated with TEA due to rheumatoid arthritis. These patients often also had affected wrist or shoulder problems to which the DASH score is sensitive as it is not a specific elbow score. This could explain the differences in correspondence between the DASH and both the D-OES and the MEPS as the two latter measures are specific to the elbow.

The original OES is valid, reliable and sensitive as a tool for assessing patients with elbow disorders [4, 10]. To ensure the validity for patients operated with TEA, we performed the Rasch analysis.

The data fitted the stringent Rasch model with good infit MNSQ and outfit MNSQ values. This indicates that the data were neither underfit and thereby lacking predictability, nor overfit and thereby over-predictable of any of the items. The person separation index and the person reliability index were excellent. The item separation coefficient and the item reliability coefficient were also excellent supporting that the sample size was large enough to confirm the item difficulty hierarchy. The category rating scale of the D-OES worked well. The patients were able to discriminate the five levels of the items, and the Rasch Andrich threshold increased monotonically. PCA showed that the Danish version of the OES was multidimensional. The four items of the pain domain and the four items of the social psychological domain were recognized as separate domains. The original version of the OES was based on English culture. The purpose of this study was to translate the original OES into a Danish version including any necessary cross-cultural adaption to avoid misinterpretations. To ensure that respondents understand the questions as intended, it is important to translate using an accurate and approved method. We did not encounter questions that challenged national traditions or characteristics as the functional tasks, the social-psychological concerns and the perception of pain are not thought to be different between the UK and Denmark. The OES was therefore not subject to any cross-cultural adaption.

Limitations

We did not include sensitivity to change in our study. The ability to detect changes to intervention such as surgery would have added strength to the validation process. Even though the OES PROM is targeted towards the affected elbow, we did investigate whether or not the patients performed the functional tasks with the unaffected side instead. The study group consisted of patients with TEA only, and the D-OES is therefore valid for this category of patients only. The value of the D-OES in evaluating patients with other elbow disorders needs to be studied in future studies.

CONCLUSION

The Danish 12-item OES is now published as a valid and reliable multidimensional elbow-specific questionnaire that can be used as a quality of life measure in TEA patients.

Correspondence: Hans Christian Plaschke, Skulder- og Albuesektionen, Ortopædkirurgisk Afdeling, Herlev Hospital, 2730 Herlev, Denmark. E-mail: hcplaschke@dadlnet.dk

Accepted: 14 August 2013

Conflicts of interest:None. Disclosure forms provided by the authors are available with the full text of this article at www.danmedj.dk

Acknowledgement: Many thanks to Tobias Wirenfeldt Klausen for statistical analysis.

Referencer

  1. Patrick D, Guyatt GH, Acquadro C. Patient-reported outcomes. In: Higgens JPT, Green S, eds. Cochrane handbook for systematic reviews of interventions. Chichester, UK: John Wiley & Sons, 2008:531-45.

  2. Walters SJ. Quality of life outcomes in clinical trials and health-care evaluation: a practical guide to analysis and interpretation. Chichester, UK: John Wiley & Sons, Ltd, 2009.

  3. Longo UG, Franceschi F, Loppini M et al. Rating systems for evaluation the elbow. Br Med Bull 2008;87:131-61.

  4. Dawson J, Doll H, Boller I et al. The development and validation of a patient-reported questionnaire to assess outcomes of elbow surgery. J Bone Joint Surg Br 2008;90:466-73.

  5. Hudak PL, Amadio PC, Bombardier C. The Upper Extremity Collaborative Group (UECG). Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder and hand). Am J Ind Med 1996;29:602-8. Erratum in: Am J Ind Med 1996;30:372.

  6. Gummesson C, Atroshi I, Ekdahl C. The disabilities of the arm, shoulder and hand (DASH) outcome questionnaire: longitudinal construct validity and measuring self-rated health change after surgery, BMC Musculoskelet Disord 2003;4:11.

  7. Herup A, Merser S, Boeckstyns M. Validation of questionnaire for conditions of the upper extremity. Ugeskr Læger 2010;172:3333-6.

  8. Morrey BF, Adams RA. Semiconstrained arthroplasty for the treatment of rheumatoid arthritis of the elbow. J Bone Joint Surg Am 1992;74:479-90.

  9. De Haan J, Goei H, Schep N et al. The reliability, validity and responsiveness of the Dutch version of the Oxford elbow score. J Orthop Surg Res 2011;6:39.

  10. De Haan J, Schep N, Tuinebreijer W et al. Rasch analysis of the Dutch version of the Oxford elbow score. Patient Relat Outcome Meas 2011;2: 145-9.

  11. Dawson J, Doll H, Boller I et al. Comparative responsiveness and minimal change for the Oxford Elbow Score following surgery. Qual Life Res 2008;17:1257-67.

  12. Beaton DE, Bombardier C, Guillemin F et al. Guidelines for the process of cross-cultural adaptation of self-report measures. Spine 2000;25:3186-91.

  13. Tennant A, Conaghan PG. The Rasch measurement model in rheumatology: what is it and why use it? When should it be applied, and what should one look for in a Rasch paper? Arthritis Rheum 2007;57:1358-62.