INTRODUCTION: Surveillance networks are often not exhaustive nor completely complementary. In such situations, capture-recapture methods can be used for incidence estimation. The choice of estimator and their robustness with respect to the homogeneity and independence assumptions are however not well documented.
METHODS: We investigated the performance of five different capture-recapture estimators in a simulation study. Eight different scenarios were used to detect and combine case-information. The scenarios increasingly violated assumptions of independence of samples and homogeneity of detection probabilities. Belgian datasets on invasive pneumococcal disease (IPD) and pertussis provided motivating examples.
RESULTS: No estimator was unbiased in all scenarios. Performance of the parametric estimators depended on how much of the dependency and heterogeneity were correctly modelled. Model building was limited by parameter estimability, availability of additional information (e.g. covariates) and the possibilities inherent to the method. In the most complex scenario, methods that allowed for detection probabilities conditional on previous detections estimated the total population size within a 20-30% error-range. Parametric estimators remained stable if individual data sources lost up to 50% of their data. The investigated non-parametric methods were more susceptible to data loss and their performance was linked to the dependence between samples; overestimating in scenarios with little dependence, underestimating in others. Issues with parameter estimability made it impossible to model all suggested relations between samples for the IPD and pertussis datasets. For IPD, the estimates for the Belgian incidence for cases aged 50 years and older ranged from 44 to58/100,000 in 2010. The estimates for pertussis (all ages, Belgium, 2014) ranged from 24.2 to30.8/100,000.
CONCLUSION: We encourage the use of capture-recapture methods, but epidemiologists should preferably include datasets for which the underlying dependency structure is not too complex, a priori investigate this structure, compensate for it within the model and interpret the results with the remaining unmodelled heterogeneity in mind.