Acute appendicitis (i.e. severe inflammation of the appendix) is the most common surgical emergency in children. Roughly ⅔ of children with acute appendicitis present with simple appendicitis, which is effectively treated with a short surgery wherein the appendix is removed and has minimal complications. However, for the roughly 2,000 children in Canada who progress each year to perforation, there are multiple treatment options with no uniformly accepted management protocol, resulting in significant variability in treatment and outcomes. Within perforated appendicitis itself, there are various degrees of severity, necessitating the use of grade classifications as shown in figure 1.
Currently, the perforation and severity of appendicitis can only be ascertained during a surgical intervention. However, preoperative discrimination between simple and perforated appendicitis remains essential to providing personalized care and counseling to patients. As well, preoperative discrimination between the severity grades of perforated appendicitis can allow hospitals to predict surgical outcomes and resource utilization.
Our objective was to create a machine learning (ML) pipeline capable of preoperatively predicting the severity of acute appendicitis in the pediatric population. This included creating a binary classifier for predicting the presence of perforation, and a multi-class classifier for predicting the grade of perforation.
Our retrospective dataset included 2000 patients younger than 18 years who had undergone urgent appendectomy for acute appendicitis at the Montreal Children’s Hospital. Patient information, including history, preoperative laboratory and ultrasound data, and intra-operative findings, were collected for each patient from the local EMR.
We tested two different approaches to predicting the complication and grade of appendicitis. The first, or “direct” approach, used preoperative features (e.g. ultrasound reports, urinalysis results) to predict directly the complication or grade. The second, or “indirect” approach, individually predicted postoperative features that were then used in a deterministic equation to predict the complication or grade. These postoperative features include the presence of intra-abdominal abscess (none, single, multiple), peritonitis (none, localized, generalized) and perforation (present, not present) as reported in the operative report (OR).
During preprocessing, features deemed unhelpful to the classification goal by a pediatric surgeon were removed. As well, missing values and class imbalances were addressed by testing several imputation strategies and upsampling methods, respectively.
The direct and indirect approaches were separately tested, and for each prediction target the best combination of imputation strategy, class balancing technique and classification model were chosen. The optimized metric for intra-abdominal abscess and peritonitis in the indirect method and grade of appendicitis in the direct method was the area under the receiver operator curve (AUROC). The metric used for perforation in the indirect approach and appendicitis complication in the direct approach, was a utility metric that awarded certain weights to true and false positives and negatives. This was meant to reflect the real-world consequences of a misdiagnosis in our setting. The final presented metrics for the direct and indirect approaches to perforated vs non-perforated prediction can be found in Figure 3. All the models were validated by a pediatric surgeon. The results for the grade of perforation are upcoming.
NPV is the proportion of predicted negative results (i.e. non-perforated) that are truly negative, and PPV is the proportion of predicted positive results (i.e. perforated) that are truly perforated. The indirect approach appears to be more clinically useful than the direct approach due to its higher NPV score and accuracy. For comparison, we implemented Feng and colleague's prediction model for appendicitis complication prediction1 which previously outperformed popular appendicitis scoring methods. Our ML pipeline more effectively distinguishes between complicated and simple appendicitis. As well, Feng and colleagues' method is tailored to only a subset of the pediatric population, rendering it inapplicable to our goal.
We also plan to externally validate our models, which no other clinical prediction model with the aim of discerning complicated and simple acute appendicitis has done.
1. Feng, W., Zhao, X.-F., Li, M.-M. & Cui, H.-L. A clinical prediction model for complicated appendicitis in children younger than five years of age. BMC Pediatr. 20, 401 (2020).