Andreas Madsen

Alumni

Website

Google Scholar

Blog Posts

October 1, 2024

How Do We Explain AI and Ensure the Explanation Is True? Faithfulness Measurable Models Tell You How

Andrea Madsen

Siva Reddy

Sarath Chandar

Read the article

Publications

Are self-explanations from Large Language Models faithful?

Andreas Madsen

A. Chandar

Siva Reddy

2024-07-31

Findings of the Association for Computational Linguistics ACL 2024 (published)

doi.org

arxiv.org

Interpretability Needs a New Paradigm

Andreas Madsen

Himabindu Lakkaraju

Siva Reddy

A. Chandar

2024-05-07

ArXiv (preprint)

doi.org

arxiv.org

Faithfulness Measurable Masked Language Models

Andreas Madsen

Siva Reddy

Sarath Chandar

A common approach to explaining NLP models is to use importance measures that express which tokens are important for a prediction. Unfortuna… (see more)tely, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues, and existing solutions that address this are computationally expensive and employ proxy models. Furthermore, other metrics are very limited in scope. This work proposes an inherently faithfulness measurable model that addresses these challenges. This is achieved using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to 16 different datasets and validate it using statistical in-distribution tests. The faithfulness is then measured with 9 different importance measures. Because masking is in-distribution, importance measures that themselves use masking become consistently more faithful. Additionally, because the model makes faithfulness cheap to measure, we can optimize explanations towards maximal faithfulness; thus, our model becomes indirectly inherently explainable.

2024-04-30

ICML.cc/2024/Conference (spotlight)

doi.org

proceedings.mlr.press

Post-hoc Interpretability for Neural NLP: A Survey

Andreas Madsen

Siva Reddy

A. Chandar

2022-12-22

ACM Computing Surveys (published)

doi.org

arxiv.org

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

To explain NLP models a popular approach is to use importance measures, such as attention, which inform input tokens are important for makin… (see more)g a prediction. However, an open question is how well these explanations accurately reflect a model's logic, a property called faithfulness. To answer this question, we propose Recursive ROAR, a new faithfulness metric. This works by recursively masking allegedly important tokens and then retraining the model. The principle is that this should result in worse model performance compared to masking random tokens. The result is a performance curve given a masking-ratio. Furthermore, we propose a summarizing metric using relative area-between-curves (RACU), which allows for easy comparison across papers, models, and tasks. We evaluate 4 different importance measures on 8 different datasets, using both LSTM-attention models and RoBERTa models. We find that the faithfulness of importance measures is both model-dependent and task-dependent. This conclusion contradicts previous evaluations in both computer vision and faithfulness of attention literature.

2022-11-30

Findings of the Association for Computational Linguistics: EMNLP 2022 (published)

doi.org

arxiv.org

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Andreas Madsen

Blog Posts

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Andreas Madsen

Blog Posts

Publications