INTRODUCTION
Surveillance modernization efforts emphasize the potential use of electronic health record (EHR) data to inform public health surveillance and prevention. However, EHR data streams vary widely in their completeness, accuracy, and representativeness.
METHODS
We developed a validation process for the Multi-State EHR-Based Network for Disease Surveillance (MENDS) pilot project to identify and resolve data quality issues that could affect chronic disease prevalence estimates. We examined MENDS validation processes from December 2020 through August 2023 across 5 data-contributing organizations and outlined steps to resolve data quality issues.
RESULTS
We identified gaps in the EHR databases of data contributors and in the processes to extract, map, integrate, and analyze their EHR data. Examples of source-data problems included missing data on race and ethnicity and zip codes. Examples of data processing problems included duplicate or missing patient records, lower-than-expected volumes of data, use of multiple fields for a single data type, and implausible values.
CONCLUSION
Validation protocols identified critical errors in both EHR source data and in the processes used to transform these data for analysis. Our experience highlights the value and importance of data validation to improve data quality and the accuracy of surveillance estimates that use EHR data. The validation process and lessons learned can be applied broadly to other EHR-based surveillance efforts.