BACKGROUND
High-grade cervical dysplasia or cervical intraepithelial neoplasia grade 2 or worse has been widely used as a surrogate endpoint in cervical cancer screening or prevention trials.
METHODS
To identify high-grade cervical dysplasia and cervical cancer, we developed claims-based algorithms that incorporated a combination of diagnosis and procedure codes using the billing data in an electronic medical records database and assessed the validity of the algorithms in an independent administrative claims database. We calculated the positive predictive value (PPV) with the 95% confidence interval (CI) of each algorithm, using new cytologic or pathologic diagnosis of cervical intraepithelial neoplasia 2 or 3, carcinoma in situ, or cervical cancer as the gold standard.
RESULTS
Having ≥1 diagnosis code for high-grade cervical dysplasia or cervical cancer had a PPV of 57.1% (95%CI, 54.7-59.5%). By requiring ≥2 diagnoses for high-grade cervical dysplasia or cervical cancer, separated by 7-30 days, the PPV increased to 60.2% (95%CI, 53.9-66.1%). At least two diagnoses and a procedure code within a month from the first diagnosis date yielded a PPV of 80.7% (95%CI, 73.6-86.2%). The algorithms had greater PPVs in identifying prevalent high-grade cervical dysplasia or cervical cancer. Overall, the PPVs of these algorithms were similar or slightly lower in the external claims data than in the sample used to derive the algorithms.
CONCLUSIONS
Use of ≥2 diagnosis codes in combination with a procedure code appears to be a valid tool for studying high-grade cervical dysplasia and cervical cancer in both electronic medical record and administrative claims databases.