Using electronic health records to identify candidates for human immunodeficiency virus pre-exposure prophylaxis: An application of super learning to risk prediction when the outcome is rare.

Human immunodeficiency virus (HIV) pre-exposure prophylaxis (PrEP) protects high risk patients from becoming infected with HIV. Clinicians need help to identify candidates for PrEP based on information routinely collected in electronic health records (EHRs). The greatest statistical challenge in developing a risk prediction model is that acquisition is extremely rare.

METHODS

Data consisted of 180 covariates (demographic, diagnoses, treatments, prescriptions) extracted from records on 399 385 patient (150 cases) seen at Atrius Health (2007-2015), a clinical network in Massachusetts. Super learner is an ensemble machine learning algorithm that uses k-fold cross validation to evaluate and combine predictions from a collection of algorithms. We trained 42 variants of sophisticated algorithms, using different sampling schemes that more evenly balanced the ratio of cases to controls. We compared super learner's cross validated area under the receiver operating curve (cv-AUC) with that of each individual algorithm.

RESULTS

The least absolute shrinkage and selection operator (lasso) using a 1:20 class ratio outperformed the super learner (cv-AUC = 0.86 vs 0.84). A traditional logistic regression model restricted to 23 clinician-selected main terms was slightly inferior (cv-AUC = 0.81).

CONCLUSION

Machine learning was successful at developing a model to predict 1-year risk of acquiring HIV based on a physician-curated set of predictors extracted from EHRs.

Investigators

Abbreviation

Stat Med

Publication Date

2020-06-24

Pubmed ID

32578905

Medium

Print-Electronic

Full Title

Using electronic health records to identify candidates for human immunodeficiency virus pre-exposure prophylaxis: An application of super learning to risk prediction when the outcome is rare.

Authors

Gruber S, Krakower D, Menchaca JT, Hsu K, Hawrusik R, Maro JC, Cocoros NM, Kruskal BA, Wilson IB, Mayer KH, Klompas M