Advancing Interpretability in Machine Learning: Model Summaries and Interpretable Regional Descriptors
Susanne Dandl* 1, Marc Becker2, Bernd Bischl1, Giuseppe Casalicchio2, Ludwig Bothmann1
Abstract
In the field of machine learning (ML), transparency and interpretability are central to promoting trust and informed decision-making. In response to this challenge, the emerging field of Interpretable ML or Explainable Artificial Intelligence has developed techniques for post-hoc interpretations and explanations of trained ML models. In our contribution, we present two advancements in the field of ML interpretability: through model summaries and interpretable regional descriptors. For model summaries, we introduce a novel R package, centered on performance measures and interpretation methods. The package draws inspiration from the summary method for (additive/generalized) linear models in R which generates a table that encapsulates key aspects such as model performance, effect sizes and directions for individual variables, and model complexity. In our contribution, we extend this methodology to non-parametric ML models, creating a concise yet informative table that facilitates analogous conclusions. To ensure applicability across a wide spectrum of ML models and use cases, we base our package on mlr3, a rich package ecosystem for applied ML. Interpretable regional descriptors (IRDs) represent a new form of model-agnostic, local explanations: IRDs are hyperboxes that describe how an observation’s feature values can be changed without affecting its prediction. They can (1) justify predictions by offering "even if" statements (semi-factual explanations), (2) identify influential features and (3) help to detect pointwise biases or implausibilities. Our formalization of the search for IRDs as an optimization problem introduces a unified framework covering desiderata, initialization techniques, and post-processing methods. A benchmark study validates multiple methods to generate IRDs and identifies strategies to enhance their performance. Overall, with our advancements in model interpretability, model summaries, and IRDs, we hope to make ML models more accessible, understandable, and trustworthy.