Abstract
INTRODUCTION. Discharge diagnoses in Danish medical records are transferred to health data registries and are often used for research. However, discharge diagnoses of common acute infections have not yet been validated. This study aimed to evaluate the diagnostic accuracy of discharge diagnoses in medical records related to general infectious conditions, including community-acquired pneumonia, urinary tract infections
and skin and soft tissue infections, in a population of acutely admitted medical patients.
METHODS. This was a multicentre retrospective diagnostic accuracy study including acutely admitted patients. Patients were eligible if the treating physician initially suspected an infectious condition. The diagnosis from the discharge letters was the index test. The reference standard was the final diagnosis determined by a medical expert panel.
RESULTS. We included 954 patients. Positive predictive value, sensitivity and specificity (with 95% CI) were: 95% (93-96%); 84% (82-89%); 80% (77-82%) for any infection, 83% (80-85%); 77% (75-80%); 94% (92-95%) for community-acquired pneumonia, 87% (85-89%); 82% (80-85%); 97% (96-98%) for urinary tract infection, and 100% (100-100%); 90% (88-92%); 100% (100-100%) for skin and soft tissue infections.
CONCLUSIONS. Our findings indicate that discharge diagnoses registered in the Danish National Patient Registry for the examined conditions should be used with caution for research purposes.
FUNDING. None.
TRIAL REGISTRATION. Not relevant.
Community-acquired pneumonia (CAP), urinary tract infection (UTI) and skin and soft tissue infections (SSTI) are the most common infectious diseases leading to acute admissions [1]. In 2022, there were 41,518 hospital admissions in Denmark due to infectious conditions. More than half of these were due to pneumonia (n = 22,722), a fourth to UTI (n = 11,441) and 6,168 to SSTI [2, 3]. Since these conditions are very common, they are often the focus of clinical and epidemiological research, frequently based on data from the Danish Health registries.
Using large volumes of health data from registries is a cost-effective and valuable resource for research. Universal, routine data registration prevents nonresponse and recall bias [4]. However, researchers lack control over data collection and quality. Therefore, the quality of registry data depends heavily on the accuracy and completeness of the databases [5]. Registry-based research heavily relies on the accuracy of diagnosis coding. Therefore, all diagnosis codes should preferably be validated before conducting epidemiological research [6].
Inaccurate diagnosis can have clinical consequences for patients. However, while errors in discharge diagnoses are well known, they do not necessarily imply errors in diagnosis and treatment in clinical practice. In clinical practice, the diagnosis codes are followed by a discharge summary, where the text is usually more accurate in describing the patient’s trajectory.
In Denmark, large amounts of health registry data are widely available and often used for research purposes [7]. Information on all discharges from Danish hospitals has been registered in the Danish National Patient Registry (DNPR) since 1977. Among a wide array of data, the DNPR includes the primary discharge diagnosis and, when relevant, up to several secondary discharge diagnoses [8]. The discharging physician assigns the discharge diagnoses. They are classified according to the Danish version of the International Classification of Diseases, tenth revision (ICD-10) [8].
Studies have revealed considerable variation in the validity of various diagnoses, treatments and other data in the DNPR [8]. Existing validation studies for infectious disease diagnoses in the DNPR are limited to relatively few diagnoses, primarily focusing on an infectious condition as a comorbidity to the primary condition being validated [9, 10]. The diagnostic accuracy of infectious disease diagnosis codes from acutely admitted patients still needs to be determined to demonstrate their validity. This will underpin the use of the diagnoses in epidemiological studies based on data from this patient group.
This study aimed to evaluate the diagnostic accuracy of discharge diagnoses registered in medical records for any infectious condition, and specifically for CAP, UTI and SSTI, in a population of acutely hospitalised patients with provisional diagnoses of infection.
Methods
Design and setting
This was a multicentre diagnostic accuracy study with retrospective data collection. Study reporting was guided by the Standards for Reporting of Diagnostic Accuracy Studies (STARD) statement [11].
This is a sub-study of the Improved Diagnostics of Infectious Diseases in Emergency Departments (INDEED) trial [1], which investigated new diagnostic tools for infectious diseases in the acute setting. Patients were included from March 2021 to February 2022 in the acute medical units of three Danish emergency hospitals in the Region of Southern Denmark (Odense University Hospital, Hospital Sønderjylland and Lillebælt Hospital). Six study assistants with healthcare backgrounds were responsible for recruiting participants and collecting data. Patient inclusion occurred Monday to Friday, between 8:00 am and 6:00 pm.
Study population
Patients were eligible if the treating physician suspected an infection after an initial medical assessment. The exclusion criteria were admission within the past 14 days (to avoid hospital-acquired infections), verified positive SARS-CoV-2 test (to avoid coronavirus disease 2019 dominance in the study population), severe immunosuppressive treatment, critically ill patients requiring life-saving treatment based on the receiving physicians’ judgement, and pregnancy. Details regarding these exclusion criteria are available in the INDEED study protocol [12]. Since informed consent was required to participate, patients incapable (e.g., mentally unable or with a language barrier) or unwilling to provide informed consent could not be included. If capacity limitations faced by the study assistants occurred, the patient could also not be included.
The study assistants identified potentially eligible patients through the hospitals’ logistic system (Cetrea Anywhere), which provides a continuous overview of patients while admitted. Eligible patients were recruited consecutively during the predefined working hours.
Index test
Primary and secondary discharge diagnosis codes from the discharge letter were used as the index test (listed in S1). Data were collected after discharge and entered into an electronic survey tool (Research Electronic Data Capture (REDCap)) by the study assistants, who held regular meetings to address doubts and questions and were blinded to the reference standard.
For this study, we grouped the individual diagnoses into two categories: infection or no infection. Infection diagnoses were categorised into CAP, UTI, SSTI or other/unknown infection. These groupings were considered the index tests.
Reference standard
The reference standard was patients’ infection diagnoses determined after a medical chart review performed by two experts. The experts were an emergency medicine consultant and an infectious diseases consultant, both experienced in acute infections. Each expert determined if the patient had an infection and assigned a primary ICD-10 diagnosis in the study collection tool. These experts had access to medical charts from the current hospitalisation, including all test results (blood analysis, diagnostic imaging, and microbiology), the discharge letter and the patient’s medical history. There were no limitations on the data they could extract. The experts were blinded to each other’s decisions. In case of disagreement, the experts revisited the medical charts and reached a consensus. Eight experts, working in pairs, participated in the study.
As with the index test, all the expert diagnoses were categorised as CAP, UTI, SSTI, other/unknown infection or no infection. These groupings were considered reference standards.
Statistical analyses
Patient characteristics were reported with descriptive statistics, using number and percentage or median and IQR, as appropriate.
We estimated the diagnostic accuracy of any infectious condition, CAP, UTI and SSTI by calculating the positive predictive value (PPV), sensitivity and specificity with 95% CI. Results were calculated using STATA version 18.0 (StataCorp LLC, Texas).
Patients with a missing or indeterminate index test or reference standard were excluded from the analyses. No predetermined sample size was set for this study, as we intended to use all available cases from the INDEED study.
Research ethics and informed consent
Data processing was approved by the Region of Southern Denmark (no. 20/60508), cf. Art 30 of the EU General Data Protection Regulation, approved by the Regional Committee of Health Research Ethics for Southern Denmark (S-20200188). Prior to inclusion, all participants provided informed consent.
Fakta
Trial registration: not relevant.
Results
Participants
A total of 954 patients were included in the study. Due to inconclusive or missing index tests, four patients were excluded from analyses, leaving a total of 950 patients. There were no inconclusive or missing reference standards. Patient flow is depicted in Figure 1.
Basic patient characteristics are provided in Table 1. Patient ages ranged from 18 to 100 years, with over half older than 70 years. Length of hospital stay ranged from 0 to 93 days, with a median of three days (1-6).
Diagnostic accuracy
The associations between the index tests and the reference standards are presented in Figure 1. The index test measured 686 with an infection diagnosis and 264 with no infection diagnosis. The main infection diagnoses were CAP (248 patients), UTI (188 patients) and SSTI (69 patients). Compared with the reference standard, 205 had a CAP diagnosis, 163 had a UTI diagnosis, and 69 had an SSTI diagnosis. The diagnostic accuracy is presented in Table 2. The PPV was 100.0% (100.0-100.0%) for SSTI, 94.8% (93.3-96.2%) for any infection, 86.7% (84.5-88.9%) for UTI and 82.7% (80.3-85.1%) for CAP. Sensitivity ranged from 89.6% for SSTI to 77.4% for CAP. Specificity ranged from 100% for SSTI to 79.6% for any infection.
Discussion
In this study, we aimed to validate the diagnosis codes in medical records for the most common infectious diseases causing acute hospitalisation. We found that PPV, sensitivity and specificity for any infection, CAP, UTI and SSTIs ranged from 77.4% to 100%; highest for SSTI and lowest for CAP.
We have not identified previous studies validating common infection diagnoses in acutely admitted patients. The most recent review of studies validating DNPR data identified 114 published studies, primarily validating other diagnosis codes [8]. The review found a large variation in data validity. PPV ranged from < 15% to 100%, and the average PPV for all medical diagnoses in the DNPR was 73%. Most studies in the review reported PPVs above the average, and PPVs above 80% were common. Thus, our findings might appear higher than the review average, but they align with the range and distribution of many other individual validation studies.
To our knowledge, no universal thresholds define when PPV, sensitivity and specificity may be considered acceptable. The interpretation of these values always depends on the context, including the prevalence of the condition in the population and the clinical situation, such as the consequences of false positive and false negative results. However, careful consideration should be given to the fact that the data are not 100% accurate and should therefore be used with caution in research.
A strength of this study was the large population with a high concentration of infected patients, which provided a large number of index tests, sufficient to achieve low statistical uncertainty. Additionally, we had an almost complete data set with only four incomplete cases, reducing the significance of potential bias related to case management. Although the study included many patients, more than half of those assessed for eligibility were excluded due to the exclusion criteria, potentially introducing selection bias. For instance, a substantial number of the excluded patients (e.g. immunosuppressed or mentally incompetent patients) may have had other complications, increasing the likelihood of a treating physician failing to diagnose an infectious condition. This type of selection bias could have inflated our results.
A limitation of this study was the collection of diagnoses for the index test from discharge letters rather than directly from the DNPR. We cannot verify that our data correlate perfectly with those of the DNPR. Even so, the diagnoses in the discharge letters are the basis for the data exported to the DNPR. Given that data transfer from hospitals to the DNPR is automated and the DNPR has safety mechanisms to detect data errors, we consider it reasonable to assume a high or near-perfect correlation between the data we validated and the data in the DNPR [8, 13].
The experts assigning the final diagnoses were not blinded to the diagnoses in the discharge letters, which could have introduced confirmation bias. This may have led to an overestimation of the measured accuracy. Detailed diagnostic criteria and standardised diagnostic forms could minimise bias in reference data collection. However, we opted not to use this method because it carries its own risk of misclassification due to insufficient criteria.
The study population consists solely of acutely admitted patients from three Danish hospitals that differ considerably in size and location. Due to the homogeneous healthcare system and comprehensive record-keeping in Denmark, we assume that the results can be generalised to the nationwide Danish context for acutely admitted patients, and possibly to other Nordic countries with somewhat similar healthcare and record-keeping systems. Given the high number of excluded patients, the results should be interpreted and extrapolated with caution, as the study population represents a selected subgroup.
Conclusions
We found that the PPV, sensitivity and specificity for any infection, CAP, UTI and SSTI ranged from 77.4% to 100%; the highest was for SSTI; the lowest, for CAP. These findings indicate that discharge diagnoses regarding these conditions in medical records and by proxy DNPR should be used with caution for research purposes.
Correspondence John Bo Kristensen. E-mail: John.bo.kristensen@rsyd.dk
Accepted 30 January 2026
Published 20 May 2026
Conflicts of interest none. All authors have submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest. These are available together with the article at ugeskriftet.dk/dmj
Acknowledgements We acknowledge the INDEED study group for comprehensive data collection work. Additionally, we acknowledge Open Patient Data Explorative Network (OPEN), Department of Clinical Research, University of Southern Denmark) for providing the software tools for data capture, and Caroline Moos for professional language editing of the article
References can be found with the article at ugeskriftet.dk/dmj
Cite this as Dan Med J 2026;73(6):A05250358
doi 10.61409/A05250358
Open Access under Creative Commons License CC BY-NC-ND 4.0
Supplementary material a05250358-Supplementary.pdf
Referencer
- Skjøt-Arkil H, Cartuliares MB, Heltborg A, et al. Clinical characteristics and diagnostic accuracy of preliminary diagnoses in adults with infections in Danish emergency departments: a multicentre combined cross-sectional and diagnostic study. BMJ Open. 2024;14(12):e090259. https://doi.org/10.1136/bmjopen-2024-090259
- Danmarks Statistik. Statistikbanken. www.statistikbanken.dk/statbank5a/selectvarval/define.asp?PLanguage=0&subword=tabsel&MainTable=INDL01&PXSId=235925&tablestyle=&ST=SD&buttons=0 (6 Jan 2024)
- Sundhedsdatastyrelsen. Landspatientregisteret: avanceret udtræk. www.esundhed.dk/Emner/Operationer-og-diagnoser/Landspatientregisteret-Avanceret-udtraek (6 Jan 2024)
- Sørensen HT, Sabroe S, Olsen J. A framework for evaluation of secondary data sources for epidemiological research. Int J Epidemiol. 1996;25(2):435-442. https://doi.org/10.1093/ije/25.2.435
- Schneeweiss S, Avorn J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J Clin Epidemiol. 2005;58(4):323-337. https://doi.org/10.1016/j.jclinepi.2004.10.012
- Andersen TF, Madsen M, Jørgensen J, et al. The Danish National Hospital Register. A valuable source of data for modern health sciences. Dan Med Bull. 1999;46(3):263-268
- Frank L. Epidemiology. When an entire country is a cohort. Science. 2000;287(5462):2398-2399. https://doi.org/10.1126/science.287.5462.2398
- Schmidt M, Schmidt SAJ, Sandegaard JL, et al. The Danish National Patient Registry: a review of content, data quality, and research potential. Clin Epidemiol. 2015;7:449-490. https://doi.org/10.2147/CLEP.S91125
- Ingeman A, Andersen G, Hundborg HH, Johnsen SP. Medical complications in patients with stroke: data validity in a stroke registry and a hospital discharge registry. Clin Epidemiol. 2010;2:5-13. https://doi.org/10.2147/CLEP.S8908
- Holland-Bill L, Xu H, Sørensen HT, et al. Positive predictive value of primary inpatient discharge diagnoses of infection among cancer patients in the Danish National Registry of Patients. Ann Epidemiol. 2014;24(8):593-597, 597.e1-e18. https://doi.org/10.1016/j.annepidem.2014.05.011
- Cohen JF, Korevaar DA, Altman DG, et al. STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration. BMJ Open. 2016;6(11):e012799. https://doi.org/10.1136/bmjopen-2016-012799
- Skjøt-Arkil H, Heltborg A, Lorentzen MH, et al. Improved diagnostics of infectious diseases in emergency departments: a protocol of a multifaceted multicentre diagnostic study. BMJ Open. 2021;11(9):e049606. https://doi.org/10.1136/bmjopen-2021-049606
- Wille-Jørgensen PA, Meisner S. The validity of data in registration of operations. A quality analysis. Ugeskr Læger. 1997;159:7328-7330