Imputing Missing Covariates in Time-to-event Analysis within Distributed Research Networks: A Simulation Study.

PURPOSE

In distributed research network (DRN) settings, multiple imputation cannot be directly implemented because pooling individual-level data is often not feasible. The performance of multiple imputation in combination with meta-analysis is not well understood within DRNs.

METHODS

To evaluate the performance of imputation for missing baseline covariate data in combination with meta-analysis for time-to-event analysis within DRNs, we compared two parametric algorithms including one approximated linear imputation model (Approx), and one nonlinear substantive model compatible imputation model (SMC), as well as two non-parametric machine learning algorithms including random forest (RF), and classification and regression trees (CART), through simulation studies motivated by a real-world data set.

RESULTS

Under the setting with small effect sizes (i.e., log-Hazard Ratios (logHR)) and homogeneous missingness mechanisms across sites, all imputation methods produced unbiased and more efficient estimates while the complete-case analysis could be biased and inefficient; and under heterogeneous missingness mechanisms, estimates with RF method could have higher efficiency. Estimates from the distributed imputation combined by meta-analysis were similar to those from the imputation using pooled data. When logHRs were large, the SMC imputation algorithm generally performed better than others.

CONCLUSIONS

These findings suggest the validity and feasibility of imputation within DRNs in the presence of missing covariate data. The performance of the four imputation algorithms varies with the effect sizes and level of missingness.

Investigators

Abbreviation

Pharmacoepidemiol Drug Saf

Publication Date

2023-03

Volume

Issue