OBJECTIVES
To explore the feasibility of using statistical text classification techniques to automatically categorise clinical incident reports.
METHODS
Statistical text classifiers based on Naïve Bayes and Support Vector Machine algorithms were trained and tested on incident reports submitted by public hospitals to identify two classes of clinical incidents: inadequate clinical handover and incorrect patient identification. Each classifier was trained on 600 reports (300 positives, 300 negatives), and tested on 372 reports (248 positives, 124 negatives). The results were evaluated using standard measures of accuracy, precision, recall, F-measure and area under curve (AUC) of receiver operating characteristics (ROC). Classifier learning rates were also evaluated, using classifier accuracy against training set size.
RESULTS
All classifiers performed well in categorising clinical handover and patient identification incidents. Naïve Bayes attained the best performance on handover incidents, correctly identifying 86.29% of reporter-classified incidents (precision = 0.84, recall = .90, F-measure = 0.87, AUC = 0.93) and 91.53% of expert-classified incidents (precision = 0.87, recall = 0.98, F-measure = 0.92, AUC = 0.97). For patient identification incidents, the best results were obtained when Support Vector Machine with radial-basis function kernel was used to classify reporter-classified reports (accuracy = 97.98%, precision = 0.98, recall = 0.98, F-measure = 0.98, AUC = 1.00); and when Naïve Bayes was used on expert-classified reports (accuracy = 95.97%, precision = 0.95, recall = 0.98, F-measure = 0.96, AUC = 0.99). A relatively small training set was found to be adequate, with most classifiers achieving an accuracy above 80% when the training set size was as small as 100 samples.
CONCLUSIONS
This study demonstrates the feasibility of using text classification techniques to automatically categorise clinical incident reports.