Traditionally health statistics are derived from civil and/or vital registration. Civil registration in low- to middle-income countries varies from partial coverage to essentially nothing at all. Consequently the state of the art for public health information in low- to middle-income countries is efforts to combine or triangulate data from different sources to produce a more complete picture across both time and space – data amalgamation. Data sources amenable to this approach include sample surveys, sample registration systems, health and demographic surveillance systems, administrative records, census records, health facility records and others. We propose a new statistical framework for gathering health and population data – Hyak – that leverages the benefits of sampling and longitudinal, prospective surveillance to create a cheap, accurate, sustainable monitoring platform. Hyak has three fundamental components:
• Data amalgamation: A sampling and surveillance component that organizes two or more data collection systems to work together: (1) data from HDSS with frequent, intense, linked, prospective follow-up and (2) data from sample surveys conducted in large areas surrounding the Health and Demographic Surveillance System (HDSS) sites using informed sampling so as to capture as many events as possible;
• Cause of death: Verbal autopsy to characterize the distribution of deaths by cause at the population level; and
• Socioeconomic status (SES): Measurement of SES in order to characterize poverty and wealth.
We conduct a simulation study of the informed sampling component of Hyak based on the Agincourt HDSS site in South Africa. Compared with traditional cluster sampling, Hyak's informed sampling captures more deaths, and when combined with an estimation model that includes spatial smoothing, produces estimates of both mortality counts and mortality rates that have lower variance and small bias.