[5] XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision."[15] Interpretability describes the possibility of comprehending the ML model and presenting the underlying basis for decision-making in a way that is understandable to humans.[16][17][18] Explainability is a concept that is recognized as important, but a consensus definition is not yet available;[15] one possibility is "the collection of features of the interpretable domain that have contributed, for a given example, to producing a decision (e.g., classification or regression)".This is especially important in domains like medicine, defense, finance, and law, where it is crucial to understand decisions and build trust in the algorithms.[11] Many researchers argue that, at least for supervised machine learning, the way forward is symbolic regression, where the algorithm searches the space of mathematical expressions to find the model that best fits a given dataset.A human can audit rules in an XAI to get an idea of how likely the system is to generalize to future real-world data outside the test set.[32][33] One transparency project, the DARPA XAI program, aims to produce "glass box" models that are explainable to a "human-in-the-loop" without greatly sacrificing AI performance.These tools aim to ensure that the system operates in accordance with ethical and legal standards, and that its decision-making processes are transparent and accountable.[44] Scholars sometimes use the term "mechanistic interpretability" to refer to the process of reverse-engineering artificial neural networks to understand their internal decision-making mechanisms and components, similar to how one might analyze a complex machine or computer program.[47] Studying the interpretability of the most advanced foundation models often involves searching for an automated way to identify "features" in generative pretrained transformers.MYCIN, developed in the early 1970s as a research prototype for diagnosing bacteremia infections of the bloodstream, could explain[56] which of its hand-coded rules contributed to a diagnosis in a specific case.For instance, SOPHIE could explain the qualitative reasoning behind its electronics troubleshooting, even though it ultimately relied on the SPICE circuit simulator."[58]: 164–165 By the 1990s researchers began studying whether it is possible to meaningfully extract the non-hand-coded rules being generated by opaque trained neural networks.[59] Researchers in clinical expert systems creating[clarification needed] neural network-powered decision support for clinicians sought to develop dynamic explanations that allow these technologies to be more trusted and trustworthy in practice.[9] In the 2010s public concerns about racial and other bias in the use of AI for criminal sentencing decisions and findings of creditworthiness may have led to increased demand for transparent artificial intelligence.[63][17][16][64][65][66] This includes layerwise relevance propagation (LRP), a technique for determining which features in a particular input vector contribute most strongly to a neural network's output.Several groups found that neurons can be aggregated into circuits that perform human-comprehensible functions, some of which reliably arise across different networks trained independently.[88] As regulators, official bodies, and general users come to depend on AI-based dynamic systems, clearer accountability will be required for automated decision-making processes to ensure trust and transparency.[90][91] The European Union introduced a right to explanation in the General Data Protection Regulation (GDPR) to address potential problems stemming from the rising importance of algorithms.[92] In France the Loi pour une République numérique (Digital Republic Act) grants subjects the right to request and receive information pertaining to the implementation of algorithms that process data about them.[100][101] Critiques of XAI rely on developed concepts of mechanistic and empiric reasoning from evidence-based medicine to suggest that AI technologies can be clinically validated even when their function cannot be understood by their operators.The goals of XAI amount to a form of lossy compression that will become less effective as AI models grow in their number of parameters.Peters, Procaccia, Psomas and Zhou[105] present an algorithm for explaining the outcomes of the Borda rule using O(m2) explanations, and prove that this is tight in the worst case.Given a coalitional game, their algorithm decomposes it to sub-games, for which it is easy to generate verbal explanations based on the axioms characterizing the Shapley value.
Grokking
is an example of phenomenon studied in interpretability. It involves a model that initially memorizes all the answers (
overfitting
), but later adopts an algorithm that generalizes to unseen data.
[
45
]