Inferring Sensitive Attributes from Model Explanations

Code for the paper titled "Inferring Sensitive Attributes from Model Explanations" published in ACM CIKM 2022.

Requirements

You need conda. Create a virtual environment and install requirements:

conda env create -f environment.yml

To activate:

conda activate attinf-explanations

To update the env:

conda env update --name attinf-explanations --file environment.yml

or

conda activate attinf-explanations
conda env update --file environment.yml

Dataset

Link to datasets: https://drive.google.com/drive/folders/1bUH02Y9I6_NVrfo5_8PwWtdklk15rXPJ

Usage

Evaluate attribute inference attacks of against explanations

python -m src.attribute_inference --dataset {LAW,MEPS,CENSUS,CREDIT,COMPAS} --explanations {IntegratedGradients,smoothgrad,DeepLift,GradientShap} --attfeature {both,expl}

attfeature evaluates the attacks on only explanations (expl) or both predictions and explanations (both)

Attacking using entire explanations for both sensitive and non-sensitive attributes

python -m src.attribute_inference --dataset {LAW,MEPS,CENSUS,CREDIT,COMPAS} --explanations {IntegratedGradients,smoothgrad,DeepLift,GradientShap} --attfeature expl --with_sattr True

Attacking using only explanations corresponding to sensitive attributes

python -m src.infer_s_from_phis --dataset {LAW,MEPS,CENSUS,CREDIT,COMPAS} --explanations {IntegratedGradients,smoothgrad,DeepLift,GradientShap}

Update (2024): Bug Fix

There was a bug in one of the parameters for generating explanations: "target" was initially set to 0 but it has to be set to the class for the input. This has been updated. The attack accuracies are different and results in some cases better than what was reported in the paper since the gradients are computed with respect to the correct class. The conclusions in the paper that model explanations leak sensitive attributes is still valid.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Inferring Sensitive Attributes from Model Explanations

Requirements

Dataset

Usage

Evaluate attribute inference attacks of against explanations

Attacking using entire explanations for both sensitive and non-sensitive attributes

Attacking using only explanations corresponding to sensitive attributes

Update (2024): Bug Fix

Files

README.md

Latest commit

History

README.md

File metadata and controls

Inferring Sensitive Attributes from Model Explanations

Requirements

Dataset

Usage

Evaluate attribute inference attacks of against explanations

Attacking using entire explanations for both sensitive and non-sensitive attributes

Attacking using only explanations corresponding to sensitive attributes

Update (2024): Bug Fix