Fakta
Abstract
In Denmark, the first case of severe acute respiratory syndrome coronavirus2 (SARSCoV2)related disease 2019 (COVID19) was diagnosed on 27 February 2020. The epidemic was initially countered by a containment strategy that was subsequently transitioned into a mitigation strategy [1]. With this shift came a change in the test strategy from an initial focus on potentially infected travellers from highrisk countries to testing persons with severe symptoms and symptomatic persons from vulnerable groups or persons holding critical societal functions, and then to testing persons with mild symptoms. Most recently, the Danish Ministry of Health began offering testing to all adults, even if symptom free. The early prioritised strategy was supported by a limited test capacity only. The standard method used to diagnose COVID19 is realtime reverse transcription quantitative polymerase chain reaction (RTqPCR) of SARSCoV2 RNA in respiratory samples [2]. Detection of antibodies to SARSCoV2 in plasma or serum may indicate past or present exposure to SARSCoV2 infection [2].
One of many challenges of any novel disease is that initially the absence of a gold standard against which to evaluate tests. A negative RTqPCR for SARSCoV2 is therefore often interpreted as “disease free”. However, RTqPCR may lead to falsenegative findings due to insufficient and unrepresentative sampling. As the COVID19 epidemic evolves and the test strategy changes, it is important to understand the basic concepts of test performance, including the gold standard selected and measures of sensitivity, specificity and predictive values. Unfortunately, misconceptions regarding test performance exist; in particular, sensitivity and positive predictive values (PPVs) are often confused. The distinction is, however, of great importance because predictive values depend on disease prevalence and testing has gradually extended to lowprevalence populations.
In this study, we reanalysed published data on RTqPCR to diagnose COVID19 in order to illustrate i) how predictive values of RTqPCR depend on disease prevalence, sensitivity and specificity and ii) how socalled latent class analysis (LCA) might be used to estimate the sensitivity and specificity of multiple clinical or paraclinical tests,e.g. RTqPCR, where a gold standard is lacking. We use these results to discuss challenges in diagnosing COVID19 and the potential clinical and societal implications.
METHODS
Estimation of predictive values forrealtime reverse transcription quantitative polymerase chain reaction
The sensitivity (Se) of a test is the conditional probability of a positive test result given presence of the disease (the percentage of diseased individuals identified by the test), whereas specificity (Sp) is the conditional probability of a negative test result given absence of disease. The PPV is defined as the conditional probability of having a disease given a positive test (the percentage of individuals with a positive test who truly have the disease). The negative predictive value (NPV) is the conditional probability of being without the disease given a negative test result (the percentage of individuals with a negative test who do not have the disease.) While the sensitivity and specificity are characteristics of the test, the predictive values measure the clinical relevance of a test result. They are calculated as:
Using these equations, we generated plots for the PPV and NPV of RTqPCR for SARSCoV2 infection at different levels of COVID19 prevalence. For the main analysis, we used the sensitivity (95%) and specificity (99%) for RTqPCR, as tabled by the Danish Health Authority on 14 April 2020[3]. However, because a sensitivity down to 30% has been reported depending on the site of sampling [4], we also repeated the analyses for sensitivity ranging from 30% to 80%. Moreover, we set specificity to 99% – the lower level suggested by the Danish Health Authority. However, this figure may be an underestimate. Crossreactivity to other endemic respiratory viruses has not been found under reference conditions [5]. Contamination etc. are minimised by strict procedures in clinical practice. We therefore also repeated the analyses using a higher specificity of 99.98%, which was also supported by our LCA analysis (see below).
Concept and application of latent class analysis
We examined the potential of LCA for estimating the sensitivity and specificity of RTqPCR not otherwise available in the beginning of the epidemic. LCA is a statistical method where latent classes are constructed as a proxy for the true but unknown disease status of the individuals. LCA combines information on multiple observed variables (e.g., different diagnostic test results) to group persons with similar distributions into an unobserved “latent class” (i.e., based on conditional probabilities) [6]. In other words, LCA uncovers hidden groups in a dataset, e.g., groups of different riskaccepting behaviour or disease subgroups. Each subgroup (latent class) is unique, but individuals within a subgroup are similar (homogenous). The latent classes are constructed by numerous iterations for establishing the maximum likelihood of a model given the observed data [7]. Each latent class exhibits local independence (i.e., is homogenous) and is defined by its size (π[latent class i] and by the conditional probabilities of an observable variable (π[manifest class j│latent class i]). The correct number of classes can be assessed using various methods, including Akaike’s information criterion (AIC) and the Bayesian information criterion (BIC), to ensure the most appropriate fit (the model with the lowest AIC or BIC) [7, 8]. A hypothetical example illustrating the concept of LCA is available in Supplementary Methods 1.
We estimated the sensitivity and specificity of RTqPCR to diagnose COVID19 by applying LCA to test results for chest CT and RTqPCR, reported in a crosssectional study conducted by Ai et al in Wuhan, China from 6 January to 6 February 2020 [9]. The purpose of their study was to examine if chest CT may provide a relevant supplement in diagnosing COVID19. The study included 1,014 patients suspected of COVID19 who had both chest CT and RTqPCR recorded. Serial scans and assays were assessed when available. TaqMan OneStep RTqPCR kits approved by the China Food and Drug Administration were used. The test results are presented in Table 1 (reproduced from Figure 1 in the study by Ai et al).
We first used unrestricted LCA to differentiate between two latent classes: I) “COV+” with characteristic (highly probable) COVID19 patients and II) “COV–” with nonCOVID19 individuals. In the optimal scenario, we would expect the latent class “COV+” to include truly infected persons (concomitant positive RTqPCR and chest CT), whereas “COV–” would include truly uninfected persons (concomitant negative RTqPCR and chest CT). The choice of a model with two latent classes was based on a comparison of the AIC, showing potential overfit of a model with a higher number of classes (AIC of ten for three vs four for two latent classes). Recalling the aforementioned definitions of sensitivity and specificity, the conditional probability of a positive RTqPCR test within the latent class “COV+” (π(RTqPCR+│COV+])) would serve as an estimate of the sensitivity for the RTqPCR test, whereas the conditional probability of a negative RTqPCR within the latent class of “COV–” (π(RTqPCR│COV–])) is an estimate of its specificity.
Unrestricted LCA with only two observable parameters may result in poor model definition, and it has therefore been recommended to add scientifically based constraint(s) on the model to overcome this issue [10]. The unrestricted latent class model can be considered temporary in such situations. We therefore imposed a restriction on the falsepositive rate for RTqPCR, which is the conditional probability of RTqPCR positivity for the latent class “COV–” (+RTqPCR│COV–). We set the start value at 0.01% in the iterative process for the restriction.
We used the sensitivity and specificity for RTqPCR obtained from the LCA to estimate predictive values by COVID19 prevalence and compared them with values from the main analysis. Of note, we had originally based all plots on our LCA. However, because of concerns about vulnerable model definitions with only two variables and because new data on the sensitivity and specificity of RTqPCR emerged during the review process, we based the final plots on the performance estimates from the Danish Health Authority.
We used the computer programme lEM developed by Vermunt [7, 11] for the LCA (syntax available in Supplementary Methods 2), the MedCalc computer software for calculating predictive values [12] and R statistics version 3.6 for other calculations and to produce graphs [13].
Trial registration: not relevant.
RESULTS
Figure 1A shows the predictive values as a function of prevalence and sensitivity for a specificity of 99%. For the different sensitivities used, the NPV for RTqPCR remained above 92% until reaching a COVID19 prevalence above 10%. However, the PPV varied largely for a prevalence between 0.1% (PPV 510%) and 10% (PPV 7090%). Thus, even in the situation with a high sensitivity of 95%, the PPV varies from below 10% at a prevalence of 0.1% to approximately 90% at a prevalence of 10%. For a higher specificity of 99.98% (Figure 1B), the NPVs are largely unchanged, but the PPV improved substantially, varying from 85% at a prevalence of 0.1% to close to 100% at a prevalence of 10%.
For the latent class “COV+” (61% of cases) from the unrestricted LCA, probabilities of a positive RTqPCR test and chest CT with characteristic COVID19 findings were 100% and 86%, respectively (Table 2). Although the latent class “COV–” (39% of cases) had lower conditional probability of 16% for a positive chest CT, a positive RTqPCR test was present in 68%, suggesting poor model definition. Conditional probabilities approached expected values in the restricted LCA (Table 2), except for a lower conditional probability for positive chest CT in the “COV+” class (65%). We favoured the restricted model rather than the unrestricted because of its slightly better fit (AIC = 2,029 vs 2,031; BIC 2,048 vs 2,055). The model estimated 99.98% specificity and 97.1% sensitivity for RTqPCR (Table 2). Overall, these estimates are similar to those reported by the Danish Health Authority (Table 3); however, the PPV increased substantially with decreasing start value for the falsepositive rate for RTqPCR in the iterative process.
DISCUSSION
Our figures underscore the importance of the expected COVID19 prevalence of the tested population. Early in the epidemic, primarily individuals with severe symptoms were tested and the prevalence in this population was probably high (> 10%), ensuring a high PPV (> 90% based on our main analysis). With broader population testing, e.g., testing persons with mild or no symptoms, a lower prevalence is expected. The prevalence of active COVID19 can roughly be estimated at between 0.08% and 0.8% based on the number of new cases per day (approximately 40) times the duration of the infectious state (14 days) times the dark figure (between 8 and 80) divided by population size (5.8 mil). Using these prevalence estimates and the high estimates of sensitivity (95%) and specificity (99.98%), the PPV would range between 80% and 97.5%; i.e., between 1/40 and 1/5 positive tests could be false positive. A potential high risk of false positives needs to be considered by clinicians and decision makers. If the consequence of a positive test result were quarantining, the impact of widespread testing on disease transmission would be limited. Conversely, falsepositive individuals interned in multibed rooms or halls would be at high risk of becoming infected by truly infected cointerned individuals. A low PPV may also lead to underestimates of the true casefatality rate, possibly leading to decreased public awareness and adherence to recommendations for reduction of virus transmission.
The risk of falsenegative results as a function of prevalence should also be considered. The recommendation for, e.g., healthcare personnel showing COVID19 symptoms but being outside identified local outbreaks, is quarantining until a negative test result or until 48 hoursafter symptoms have resolved if testing positive. If the previous prevalence estimate of 0.08% is used, the NPV would be 99.98%. However, if healthcare personnel are tested during a local COVID19 outbreak (e.g., in a nursing home), the prevalence may be estimated to reach 25% and the NPV would be approximately 90%. By repeating the test three times at two sevenday intervals, the risk of falsenegative results would shift from 1/10 to 1/1,000, thus reducing the risk of maintaining the outbreak.
Estimating the prevalence of COVID19 is also a challenge due to the enormous span in clinical symptoms between infected individuals and the concomitant uncertainty in the estimation of the dark figure; and some may find the prevalence of ≤ 0.8% and perhaps even less than 0.08 in our example above to be an underestimate. However, it should be kept in mind that it is the prevalence (or incidence proportion) of ongoing infection that is relevant for RTqPCR tests; this is likely to be low in the current context of extended testing and decreasing rates of transmission. The situation is different for antibody tests that will detect infected (in latestage) and recovered persons (point prevalence) alike; thus, prevalence figures are higher. On 29 May, Copenhagen University Hospital, Hvidovre reported a sensitivity of 93% and a specificity of 98.3% for a SARSCoV2 antibody test from Wantai. In a highprevalence population, this will yield a high PPV, thus making it suitable as a confirmatory ELISAbased test for persons with a positive RTqPCR test). A limitation is that the antibody test may have to be delayed or repeated if the person presents early in the disease course, as seroconversion does not occur until two weeks after infection. Similar to RTqPCR, the use of antibody tests in a lowprevalence setting (e.g., in the general population early in the epidemic) carries a higher risk of falsepositives. The consequence could be an overestimation of population immunity with an inadvertent negative impact on effective measures such as social distancing and hygienic measures as well as lower adherence to future vaccine recommendations.
Our estimates from the LCA share the limitations inherent in the study by Ai et al [9]. That study was not specifically designed as a diagnostic test study. Test indications were not reported, but all patients presumably had symptoms in the severe end of the disease spectrum. Furthermore, physicians interpreting the CTs were not blinded to other clinical patient data. Issues with the spectrum effect and subjective errors in categorisation likely led us to overestimate the sensitivity and specificity in the LCA, especially when applied to a broader population [14]. Indeed, we did observe very high values in our restricted model. However, this only secures conservative estimates of the derived predictive values. The spectrum effect may also have led to overestimates of sensitivity and specificity reported by the studies used for our main analysis.
Another limitation of our LCA is the low specificity of RTqPCR using the initial unrestricted LCA. Unrestricted LCA may give erroneous results when performed on only two observable parameters (e.g., CT and PCR), which may explain the low specificity observed in that analysis. A way of improving the model could have been to include additional clinical information (e.g., symptoms, risk behaviour, other test results), but we had no such data. Instead, we placed a restriction on the falsepositive rate. Although the sensitivity and specificity varied only slightly in the different restricted models, it cannot be ignored that the predictive values are dependent on model specification. It also illustrates that predictive values for RTqPCR testing depend on, e.g., issues with primer purity, test equipment stability and procedure stringency, and such factors may have changed during the epidemic due to demands of accelerated test facilities.
CONCLUSIONS
A high risk of falsepositive RTqPCR tests should be considered when expanding the test strategy, whereas falsenegatives may occur during local outbreaks. A confirmatory test (e.g., demonstrating seroconversion or repeated RTqPCR) may be warranted. LCA may be used to estimate test performance using multiple diagnostic tests when a gold standard is unavailable. Although there are limitations to LCA, the method may be useful for future epidemics, and there is a potential to expand the LCA with further clinical information and new diagnostic tests as they emerge.
CORRESPONDENCE: Henrik Frank Lorentzen. Email: lorentzen@dadlnet.dk
ACCEPTED: 14 July 2020
CONFLICTS OF INTEREST: none. Disclosure forms provided by the authors are available with the full text of this article at Ugeskriftet.dk/dmj
Referencer
Literature

SSI. COVID19 i Danmark Epidemiologisk overvågningsrapport 2020 07 April 2020. https://files.ssi.dk/COVID19overvaagningsrapport07042020wvp1 (2 Jun 2020).

Tang YW, Schmitz JE, Persing DH et al. Laboratory diagnosis of COVID19: current issues and challenges. J Clin Microbiol 2020;58:e0051220.

Danish Health Authority. Information om PCR test for COVID19 til almen praksis2020 14.04.2020. www.sst.dk//media/Udgivelser/2020/Corona/IRFAlmenpraksis/Kommunikationtilalmenpraksis_testafCOVID.ashx?la=da&hash=A28AC860410928AAD8864F689B6A05C9E56F3A87 (14 Apr 2020).

Wang W, Xu Y, Gao R et al. Detection of SARSCoV2 in Different types of clinical specimens. JAMA 2020;323:18434.

InstitutPasteur. Protocol: realtime RTPCR assays for the detection of SARSCoV2. www.who.int/docs/defaultsource/coronaviruse/realtimertpcrassaysforthedetectionofsarscov2institutpasteurparis.pdf?sfvrsn=3662fcb6_2 (23 Jun 2020).

McCutceon AL. Latent class analysis. Delaware: Sage, 1987.

Vermunt JK. Latent class models. lEM: a general program for the analysis of categorical data 1. Tilburg: Department of Methodology and Statistics, Tilburg University, 1997.

Kingdom FAA, Prins N. Chapter 9  Model comparisons. In: Kingdom FAA, Prins N, eds. Psychophysics. 2nd ed. San Diego: Academic Press, 2016:247307.

Ai T, Yang Z, Hou H et al. Correlation of chest CT and RTPCR testing in coronavirus disease 2019 (COVID19) in China: a report of 1014 cases. Radiology 2020:200642.

Uebersax J. LCA frequently asked questions (FAQ) 2006. www.johnuebersax.com/stat/faq.htm (6 Apr 2020).

Vermunt JK. winLEM. 1 ed. Tilburg: Tilburg University, 1996.

MedCalc Statistical Software. 19.2.1 ed. Ostend, Belgium: MedCalc Software Ltd, 2020.

R_Core_Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2019.

UsherSmith JA, Sharp SJ, Griffin SJ. The spectrum effect in tests for risk prediction, screening, and diagnosis. BMJ 2016;353:i3139.