High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources.

View Abstract

When analyzing data combined from multiple sources (e.g., hospitals, studies), the heterogeneity across different sources must be accounted for. In this paper, we consider high-dimensional linear regression models for integrative data analysis. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster source-specific regression coefficients with sub-homogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions. We also develop an efficient algorithm based on the alternating direction method of multipliers (ADMM) for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to three existing methods (a fused LASSO with adjacent fusion, a pairwise fused LASSO, and a multi-directional shrinkage penalty method). Finally, we apply the proposed method to the multi-center Childhood Adenotonsillectomy Trial to identify sub-homogeneity in the treatment effects across different study sites.

Investigators
Abbreviation
Can J Stat
Publication Date
2023-08-19
Volume
52
Issue
3
Page Numbers
900-923
Pubmed ID
39319323
Medium
Print-Electronic
Full Title
High-dimensional variable selection accounting for heterogeneity in regression coefficients across multiple data sources.
Authors
Yu T, Ye S, Wang R