Privacy-protecting multivariable-adjusted distributed regression analysis for multi-center pediatric study.

View Abstract

BACKGROUND

Privacy-protecting analytic approaches without centralized pooling of individual-level data, such as distributed regression, are particularly important for vulnerable populations, such as children, but these methods have not yet been tested in multi-center pediatric studies.

METHODS

Using the electronic health data from 34 healthcare institutions in the National Patient-Centered Clinical Research Network (PCORnet), we fit 12 multivariable-adjusted linear regression models to assess the associations of antibiotic use <24 months of age with body mass index z-score at 48 to <72 months of age. We ran these models using pooled-individual-level data and conventional multivariable-adjusted regression (reference method), as well as using pooled summary-level intermediate statistics and the more privacy-protecting distributed regression technique. We compared the results from these two methods.

RESULTS

Pooled-individual-level and distributed linear regression analyses showed virtually identical parameter estimates and standard errors. Across all 12 models, the maximum difference in any of the parameter estimates or standard errors was 4.4833 × 10.

CONCLUSIONS

We demonstrated empirically the feasibility and validity of distributed linear regression analysis using only summary-level information within a large multi-center study of children. This approach could enable expanded opportunities for multi-center pediatric research, especially when sharing of granular individual-level data is challenging.

Abbreviation
Pediatr. Res.
Publication Date
2019-10-02
Pubmed ID
31578038
Medium
Print-Electronic
Full Title
Privacy-protecting multivariable-adjusted distributed regression analysis for multi-center pediatric study.
Authors
Toh S, Rifas-Shiman SL, Lin PI, Bailey LC, Forrest CB, Horgan CE, Lunsford D, Moyneur E, Sturtevant JL, Young JG, Block JP,