Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

This is a repository for the paper Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods accepted at ACL2024.

Abstract

Our study introduces a novel evaluation framework to quantify and compare the knowledge revealed by IA and NA. To align the results of the methods we introduce the attribution method NA-Instances to apply NA for retrieving influential training instances, and IA-Neurons to discover important neurons of influential instances discovered by IA. We further propose a comprehensive list of faithfulness tests to evaluate the comprehensiveness and sufficiency of the explanations provided by both methods. Through extensive experiments and analysis, we demonstrate that NA generally reveals more diverse and comprehensive information regarding the LM's parametric knowledge compared to IA. Nevertheless, IA provides unique and valuable insights into the LM's parametric knowledge, which are not revealed by NA. Our findings further suggest the potential of a synergistic approach of combining the diverse findings of IA and NA for a more holistic understanding of an LM's parametric knowledge.

Fine-tuning

to be added

Neuron Attribution

to be added

Instance Attribution

to be added

Neuron Attribution Faithfulness Test

to be added

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
analysis		analysis
finetuning		finetuning
instance_attribution		instance_attribution
neuron_attribution		neuron_attribution
neuron_faithfulness		neuron_faithfulness
utils		utils
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

Abstract

Fine-tuning

Neuron Attribution

Instance Attribution

Neuron Attribution Faithfulness Test

About

Releases

Packages

Languages

copenlu/reveal-param-knowledge

Folders and files

Latest commit

History

Repository files navigation

Revealing the Parametric Knowledge of Language Models: A Unified Framework for Attribution Methods

Abstract

Fine-tuning

Neuron Attribution

Instance Attribution

Neuron Attribution Faithfulness Test

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages