INTRODUCTION
Capture–recapture analysis (CRA) is used to evaluate the completeness of reporting to surveillance systems and to refine incidence or prevalence estimates of diseases derived from these systems [Reference Gjini1–Reference Ackman, Birkhead and Flynn5].
Capture–recapture methodology, originally developed to estimate the size of animal populations, was adapted for use in epidemiology in the late 1960s [6, Reference Wittes and Sidel7]. In zoology, a sample of animals is captured, tagged, and released. Subsequently, a second sample is captured and the proportion of recaptured, tagged animals determined. This permits estimation of the total population. Underlying this method is the crucial assumption that the ratio of marked to unmarked animals in the entire population is the same as the ratio in the recaptured population due to complete mixing and independent sampling. Applied to human populations, persons are ‘captured’ by appearing on the list of one source and ‘recaptured’ by reappearing on one or more other lists of one or more other sources [Reference Stephen8]. Persons can be marked by unique personal identifiers such as name or health insurance number or by surrogate markers such as age, sex and date of birth [Reference Bernillon9].
In comparison to other European countries [10] the incidence of invasive meningococcal disease (IMD) in Germany is low, estimated by national surveillance data as 0·7–0·9 cases/100 000 inhabitants between 2001 and 2005 [11, 12]. As in most European countries IMD due to serogroup B is most common, followed by disease due to serogroup C.
In Germany, IMD has been a statutorily notifiable disease since 2001 according to the Protection Against Infection Act (Infektionsschutzgesetz, IfSG). Physicians diagnosing meningococcal meningitis or sepsis as well as laboratories identifying Neisseria meningitidis from sterile sites are required to notify cases to regional health authorities. All cases fulfilling the case definition [13] are relayed to state health authority level where further quality checks take place, and from there to national level at the Robert Koch Institute (RKI). The National Reference Centre for Meningococci (NRZM), located at the Institute for Hygiene and Microbiology at the University of Würzburg in the state of Bavaria, receives patient specimens from normally sterile sites and pathogen isolates from hospitals or laboratories located throughout Germany. Laboratories are not legally obliged to send specimens to NRZM for diagnosis, but they are encouraged to do so in regular newsletters and this service is provided free of charge.
So far, no attempts have been made to estimate possible underreporting of IMD in Germany. Thus, the objectives of our study were to assess the quality of IMD data reported to the national surveillance system and the NRZM, to determine the sensitivity of both sources and to estimate serogroup-specific incidence and mortality of IMD.
MATERIAL AND METHODS
Case definition
In this study cases were defined as patients with laboratory confirmation of N. meningitidis from a normally sterile site or as clinically compatible cases with an epidemiological link to a laboratory-confirmed case according to the national case definition during 2003 in Germany. Laboratory confirmation was defined as cultural isolation of N. meningitidis, microscopic detection of Gram-negative diplococci, detection of N. meningitidis nucleic acid or the detection of N. meningitidis antigen (in cerebrospinal fluid only).
Description of datasets
In 2003, 779 cases fulfilling the above case definition were reported to the RKI; four of these lacked laboratory confirmation but were clinically compatible and epidemiologically linked to a confirmed case. Specimens from 565 patients were isolated and/or typed at the NRZM in 2003.
Identifying matches
In accordance with the German Data Protection Act, data were received at RKI in anonymized form. Patients' initials and the day of birth were removed from the NRZM dataset prior to analysis at RKI. Thus, the two data sources shared no common identifier and the following five variables available in both data sources were chosen to identify matching cases: date of birth (month and year), sex, county of residence, date of illness onset (day, month and year) and serogroup. Three additional less specific variables were used to exhaustively identify any remaining potentially matching records: death of case (yes/no), diagnostic material (serum, cerebrospinal fluid) and clinical picture (meningitis, sepsis). All of these variables were identically available in both datasets with one exception. While the NRZM dataset contained the full five-digit postal code of the patient's residence, the RKI dataset contained only the county of residence. Thus the NRZM postal code was converted to the corresponding county. As postal codes can correspond to more than one county, 18 NRZM data records had two possible counties.
To identify matching records, each RKI record was compared with each NRZM record by means of a difference function programmed in Microsoft® Access, which initially included the five main matching variables as defined above. The difference (delta) between each of the matching variables (defined above) derived from the two datasets in each of these combined records was arbitrarily defined as 0 if the variables were clearly identical, 1 if they were clearly discrepant and as 0·5 if the respective variable was missing in one of the two data records. The difference between the ‘date of illness onset’ was considered to be 0 if the difference between the dates in both datasets was <7 days, 0·4 if the difference was between 7 and <14 days and 1 if it was ⩾14 days. For cases in the NRZM database whose postal code corresponded to more than one county, we assigned a delta value of 0·1 if one of the counties matched the county in the RKI database. All data-record pairs with minimal differences between them (sum of the differences ⩽0·5) were accepted as being identical. For the remaining cases, a difference function including the three additional variables (death of case, diagnostic material, clinical picture) was implemented and all data-record pairs with a sum of the differences of ⩽2·5 were considered as tentative matches. These data-record pairs were then submitted to the regional health offices after patients' initials, postal code and full date of birth were reinstated in the dataset by the NRZM. Regional health office staff was asked to look up additional available personal patient data in order to determine whether these pairs were, in fact, concordant.
All record pairs identified as matching were manually reviewed by one of the authors (A.S.).
Capture–recapture analysis
The two-sample capture–recapture method was used to estimate total IMD incidence [Reference Hook and Regal14–Reference Wittes, Colton and Sidel16]. Cases from a single underlying population ‘captured’ in one dataset are ‘recaptured’ if they appear in a second dataset. Certain underlying assumptions must hold with this method.
The sources should be independent; the population should be closed; all identified cases should be true cases; for each source, the probability of capture should be the same for all cases; and all true matches in the two sources must be identifiable.
A stratified analysis was carried out according to factors that might affect the probability of capture or be related to a possible positive dependency of the two systems in order to check for an influence of these factors on the overall incidence estimate of IMD [Reference Tilling17, Reference Brenner18]. The CRA was stratified by age (<5 years, 5–19 years, >19 years), by serogroup [limited to the two predominant serogroups B and C due to the large number of RKI cases with missing data on serogroup (Fig.)], by region [Bavaria – where the NRZM is located – and surrounding states (Baden-Wurttemberg, Rhineland-Palatinate, Saarland, Hesse, Thuringia, Saxony) vs. all remaining states (North Rhine-Westphalia, Lower Saxony, Schleswig-Holstein, Mecklenburg-Western Pomerania, Brandenburg, Berlin, Bremen, Saxony-Anhalt, Hamburg)], and by vital status to check heterogeneity of capture. As information on serogroup was considered extremely reliable and was never missing in the NRZM dataset, the serogroup from the NRZM record was assigned to the matching RKI record if the serogroup was missing or discordant with NRZM findings (39 pairs). If a case was reported to have died in one system, its match in the other system was also considered a death (14 deaths from RKI source assigned to matching NRZM cases not reported as dead).
Hospital discharge data
Hospital discharge data are available on an annual basis from the Federal Statistical Office in Germany [19]. However, as these are aggregated rather than case-based data, they cannot serve as a third source for CRA. Nonetheless, the number of cases discharged with ICD-10 code A39 (meningococcal infection) in 2003 was compared with the CRA estimate obtained in our study. The ICD-10 code A39 includes the following clinical diagnoses: meningococcal meningitis (A39.0), Waterhouse-Friderichsen Syndrome (A39.1), acute meningococcaemia (A39.2), chronic meningococcaemia (A39.3), meningococcaemia unspecified (A.39.4); meningococcal heart disease (A.39.5), other meningococcal infections (A.39.8) and meningococcal infection, unspecified (A39.9). Short stay cases (<1 day) were excluded unless they had died, as these cases are generally due to transfer between hospitals and lead to duplicate counting of cases.
For the calculation of incidences, the size of the German population was estimated as 82534786 based on the number of persons registered in each state on 31 December 2003 (data reported to the RKI by the 16 State Statistical Offices).
RESULTS
Comparison of datasets
The distribution of age and sex was similar in the two datasets. Under exclusion of cases with no information on serogroup from the RKI dataset (22%), the serogroup distributions differed slightly but significantly (P=0·03, χ2), due to a higher proportion of serogroup B disease and a lower proportion of serogroup C disease reported to the RKI, as well as the absence of non-typable and serogroup A strains in the NRZM dataset (three cases in RKI dataset).
The geographical distribution of cases according to federal state also differed significantly between the two data sources (P=0·005, Kolmogorov-Smirnoff). Compared to the distribution of cases reported to the national surveillance system, the NRZM received specimens more frequently from Bavaria and surrounding states.
CRA
The observed IMD incidence based on RKI data was 0·9 cases/100 000 inhabitants and based on NRZM data, 0·7 cases/100 000 inhabitants. A total of 507 IMD cases were identified as common to both the RKI and NRZM datasets, with 272 and 58 cases unique to the RKI and NRZM datasets respectively. Thus, 872 cases (95% CI 858–886) of IMD were estimated to have occurred in Germany in 2003 by CRA, corresponding to an IMD incidence of 1·1/100 000 inhabitants (Table). The estimated sensitivity of ascertainment was 65% for NRZM and 89% for RKI.
RKI, Robert Koch-Institute; NRZM, National Reference Centre for Meningococci (located at the Institute for Hygiene and Microbiology at the University of Würzburg, Bavaria); CI, confidence interval.
* Two cases from the NRZM had to be excluded from this analysis due to missing data on age.
† Includes only cases with serogroup B or C.
‡ Five cases from the NRZM had to be excluded from this analysis due to missing data on place of residence.
Stratified analysis
The sensitivity of ascertainment was similar in all age groups in the RKI system but slightly lower in adults compared to children and adolescents in the NRZM (Table). Estimated incidence was highest for children aged <5 years at 8·8 cases/100 000 inhabitants and lowest in adults (Table). The sum of the number of estimated cases in all age strata (876 cases, plus two cases excluded from this analysis due to missing data) was only slightly higher than the overall CRA estimate.
While the sensitivity of ascertainment was significantly higher for serogroup C (83%) than for serogroup B cases (72%) at the NRZM but not the RKI, the sum of the estimated number of serogroup B and C cases (704) differed only minimally from the total number of estimated serogroup B and C cases (700).
The estimated IMD incidence in Bavaria and surrounding states combined was 0·9/100 000 inhabitants and in the remaining states situated farther from the NRZM, 1·2/100 000 inhabitants (Table). While the estimated sensitivity of ascertainment differed only slightly between these two strata for RKI, this difference was marked for NRZM, with a sensitivity of 71% for Bavaria and surrounding states and 59% for the remaining states. The sum of the estimated number of cases in the two regions (869, plus five cases excluded from this analysis due to missing data) was also almost identical to the overall CRA estimate.
A total of 53 IMD deaths were identified as common to both the RKI and NRZM datasets, with six cases and 13 cases unique to the NRZM and RKI datasets respectively. Thus, 77 IMD deaths (95% CI 75–79) were estimated to have occurred by CRA. The sensitivity of ascertainment was higher for deaths (77%) than non-deaths (63%) at the NRZM but not the RKI (Table). The estimated mortality was thus 0·1/100 000 inhabitants and the estimated case-fatality rate was 8·8% (77/872). The sum of the estimated number of deaths and non-deaths (877) was only marginally higher than the overall estimate.
Hospital discharge statistics
According to hospital discharge data, 950 cases (1·2/100 000 inhabitants) were discharged with ICD-10 code A39 in Germany in 2003. Among these were 85 deaths (case-fatality rate 8·9%).
DISCUSSION
CRA results suggest that the incidence of IMD in Germany is indeed low at 1·1 cases/100 000 inhabitants. The degree of ascertainment of IMD cases was higher in the RKI (89·4%) compared to the NRZM source (64·8%), reflecting the high number of cases unique to RKI. The incidence of IMD in 2003 estimated by CRA was 11·9% higher than that calculated from cases reported to RKI alone.
CRA using only two sources tends to underestimate the true number of cases in the population if sources are positively dependent [Reference Tilling17]. In addition, testing for independence is only possible with more than two sources; unfortunately, however, a case-based third data source was not available in Germany. Some degree of positive dependence between the NRZM and the statutory surveillance system is probable, as laboratories sufficiently motivated to send isolates to the NRZM for further testing may also be more likely to report cases to the statutory surveillance system and vice versa [Reference Hook and Regal20]. According to Brenner [Reference Brenner18], the very fact that more severe cases are often less likely to be missed by different sources than less severe cases often leads to positive dependence of ascertainment. Stratification for factors which may have contributed to dependency between the sources (as well as to heterogeneity of capture, see below) only marginally increased the CRA estimate of IMD incidence. Thus, while it is possible that the CRA estimate of IMD incidence is still an underestimate, it is closer to the true IMD incidence in Germany than an estimate based solely on statutory surveillance data.
IMD incidence estimated according to the number of cases classified as ICD-10 A39 in the hospital discharge statistics was slightly higher than our CRA estimate. However, ICD-10 code A39 includes a variety of different diagnoses associated with meningococcal disease. While those cases classified as A39.0, A39.1 and A39.2 probably fulfilled the case definition applied in our study (829 cases), a certain proportion of the remaining cases may not have been acute or invasive. Nonetheless, these data also suggest that there may be some degree of underreporting of IMD to the statutory surveillance system in Germany.
A rigorous case definition based on laboratory confirmation or clinical compatibility with an epidemiological link to a laboratory-confirmed case was applied in order to ensure that all identified cases in both systems were true cases, thus minimizing the risk of misclassification. Only four cases from the RKI system were not laboratory confirmed, but these were included so as not to preclude possible matching with a NRZM case, which indeed occurred in one of these cases. A small degree of misclassification of the matching variables cannot be entirely ruled out. For instance, it is possible that the RKI cases with serogroup A were wrongly diagnosed due to the use of a latex agglutination test that differentiates only between serogroup B and serogroups A, C, Y and W135. Concerning the classification of deaths, routine quality assurance at the RKI since 2004 has consistently confirmed notified deaths as true deaths (W. Hellenbrand, personal communication); thus, the validity of the classification seems high. As a combination of several identifying variables was used, the matching process can also be considered robust. We were able to reliably identify a high proportion of matching data records in the two sources. Furthermore, it was possible to verify additional data-record pairs identified as tentative matches by linking personal data back to the anonymized records at the NRZM and the responsible regional health authority. This enabled verification of 57 of the 75 (76%) tentative matches identified by our matching algorithm.
For the RKI source, the sensitivity of ascertainment did not vary substantially according to age, diagnosed serogroup or region (Table), suggesting that these factors did not markedly influence the probability of capture by the statutory surveillance system [Reference Sekar and Deming21, 22]. Furthermore, the sensitivity of the RKI source was consistently higher than that of the NRZM in all strata analysed, a constellation that may also lead to an underestimate of total incidence [Reference Hook and Regal14].
In contrast, the above factors did influence the sensitivity of ascertainment by the NRZM: Adult cases had a slightly lower probability of capture, suggesting that peripheral laboratories were more likely to submit samples from children and adolescents for fine typing. Not unexpectedly, cases from Bavaria – where the NRZM is located – and surrounding states were also more likely to be captured by the NRZM, probably reflecting a higher local awareness. This finding suggests that informing laboratories located farther from the NRZM might be useful. Cases with serogroup C were also more likely to be captured by the NRZM. As serogroup C disease has been observed to be more severe and have a higher case-fatality rate in Germany [23], this may reflect initiation of more detailed diagnosis at the NRZM for more severe cases [Reference Stephen8]. Submission to the NRZM may also be related to awareness that serogroup C disease is vaccine preventable, as the identification of clusters through fine typing in the past has led to local vaccination campaigns [23]. The higher sensitivity of ascertainment for IMD deaths compared to the overall sensitivity also suggests referral to the NRZM is more likely for cases with severe disease. Although the probability of capturing a case should not vary in CRA, variable catchability contributes relatively little bias [Reference Hook and Regal24]. In our case, although stratified analysis revealed that cases were more likely to be captured by the NRZM than the RKI if they were caused by serogroup C, if they were from states surrounding the NRZM or if they had a fatal outcome (trap fascination) [Reference Stephen8, Reference Hook and Regal24], for all cases the sum of the stratified results was only minimally higher than the overall CRA incidence estimate (Table).
Prior to this study, NRZM fine typing results were only reported to the submitting laboratory. Recognition that not all cases tested at the NRZM were reported to the statutory surveillance system led to the establishment of direct reporting of fine typing results by the NRZM to the regional health offices starting in November 2004, thereby establishing a direct link between the two data systems. This has led to an improvement in the quality of the statutory surveillance data, but also means that the two systems must be considered highly dependent in future, precluding further CRA.
Overall, our results suggest that Germany is indeed a country with a low incidence of IMD compared to other European countries [10]. The sensitivity of the statutory surveillance system in Germany for the ascertainment of IMD is high. Any underreporting might be reduced by the implementation of an electronic reporting system for clinicians and laboratories in Germany, as has been shown in Sweden [Reference Jansson, Arneborn and Ekdahl25]. However, this is not planned for the immediate future. Possible reasons for low observed incidence other than underreporting include under diagnosis due antibiotic therapy prior to testing and lower blood-culture rates than in other countries [Reference Washington26]; however, data on these factors is lacking. Finally, a true low incidence may be in part explained by a lower prevalence of major risk factors for transmission of meningococci, e.g. less frequent day-care attendance by infants and toddlers and less frequent habitation by students in dormitories compared to other European countries.
ACKNOWLEDGEMENTS
We thank the staff of the regional and state health authorities for their help in verifying potentially matching data records. The German Ministry of Health supports the National Reference Centre for Meningococci via the Robert Koch Institute (Head of Reference Centre: Matthias Frosch; grant no. ZV2-1369-237).
DECLARATION OF INTEREST
None.