OBJECTIVES
To assess the impact of different methods of calculating Sequential Organ Failure Assessment (SOFA) scores using electronic health record data on the incidence, outcomes, agreement, and predictive validity of Sepsis-3 criteria.
DESIGN
Retrospective observational study.
SETTING
Five Massachusetts hospitals.
PATIENTS
Hospitalized adults, 2015 to 2022.
INTERVENTIONS
None.
MEASUREMENTS AND MAIN RESULTS
We defined sepsis as a suspected infection (culture obtained and antibiotic administered) with a concurrent increase in SOFA score by greater than or equal to 2 points (Sepsis-3 criteria). Our reference SOFA implementation strategy imputed normal values for missing data, used Pao2/Fio2 ratios for respiratory scores, and assumed normal baseline SOFA scores for community-onset sepsis. We then implemented SOFA scores using different missing data imputation strategies (averaging worst values from preceding and following days vs. carrying forward nonmissing values), imputing respiratory scores using Spo2/Fio2 ratios, and incorporating comorbidities and prehospital laboratory data into baseline SOFA scores. Among 1,064,459 hospitalizations, 297,512 (27.9%) had suspected infection and 141,052 (13.3%) had sepsis with an in-hospital mortality rate of 10.3% using the reference SOFA method. The percentage of patients missing SOFA components for at least 1 day in the infection window was highest for Pao2/Fio2 ratios (98.6%), followed by Spo2/Fio2 ratios (73.5%), bilirubin (68.5%), and Glasgow Coma Scale scores (57.2%). Different missing data imputation strategies yielded near-perfect agreement in identifying sepsis (kappa 0.99). However, using Spo2/Fio2 imputations yielded higher sepsis incidence (18.3%), lower mortality (8.1%), and slightly lower predictive validity for mortality (area under the receiver operating curves [AUROC] 0.76 vs. 0.78). For community-onset sepsis, incorporating comorbidities and historical laboratory data into baseline SOFA score estimates yielded lower sepsis incidence (6.9% vs. 11.6%), higher mortality (13.4% vs. 9.6%), and higher predictive validity (AUROC 0.79 vs. 0.75) relative to the reference SOFA implementation.
CONCLUSIONS
Common variations in calculating respiratory and baseline SOFA scores, but not in handling missing data, lead to substantial differences in observed incidence, mortality, agreement, and predictive validity of Sepsis-3 criteria.