Andreas Madsen

Sarath Chandar Anbil Parthipan

2024-05-01

ICML.cc/2024/Conference (spotlight)

Sarath Chandar Anbil Parthipan

openreview.net

Are self-explanations from Large Language Models faithful?

Andreas Madsen

2024-01-15

ArXiv (prépublication)

Post-hoc Interpretability for Neural NLP: A Survey

Andreas Madsen

Sarath Chandar Anbil Parthipan

2022-12-23

ACM Computing Surveys (publié)

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

Andreas Madsen

Nicholas Meade

Vaibhav Adlakha

To explain NLP models a popular approach is to use importance measures, such as attention, which inform input tokens are important for makin… (voir plus)g a prediction. However, an open question is how well these explanations accurately reflect a model's logic, a property called faithfulness. To answer this question, we propose Recursive ROAR, a new faithfulness metric. This works by recursively masking allegedly important tokens and then retraining the model. The principle is that this should result in worse model performance compared to masking random tokens. The result is a performance curve given a masking-ratio. Furthermore, we propose a summarizing metric using relative area-between-curves (RACU), which allows for easy comparison across papers, models, and tasks. We evaluate 4 different importance measures on 8 different datasets, using both LSTM-attention models and RoBERTa models. We find that the faithfulness of importance measures is both model-dependent and task-dependent. This conclusion contradicts previous evaluations in both computer vision and faithfulness of attention literature.

2022-12-01

Findings of the Association for Computational Linguistics: EMNLP 2022 (publié)

Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining

Andreas Madsen

Nicholas Meade

Vaibhav Adlakha

To explain NLP models a popular approach is to use importance measures, such as attention, which inform input tokens are important for makin… (voir plus)g a prediction. However, an open question is how well these explanations accurately reflect a model's logic, a property called faithfulness. To answer this question, we propose Recursive ROAR, a new faithfulness metric. This works by recursively masking allegedly important tokens and then retraining the model. The principle is that this should result in worse model performance compared to masking random tokens. The result is a performance curve given a masking-ratio. Furthermore, we propose a summarizing metric using relative area-between-curves (RACU), which allows for easy comparison across papers, models, and tasks. We evaluate 4 different importance measures on 8 different datasets, using both LSTM-attention models and RoBERTa models. We find that the faithfulness of importance measures is both model-dependent and task-dependent. This conclusion contradicts previous evaluations in both computer vision and faithfulness of attention literature.

2021-10-15

ArXiv (preprint)