Skip to content

cohortshapley/uniquenessshapley

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Uniqueness Shapley

Uniqueness Shapley is an EDA tool based on the feature importance method Cohort Shapley, and it quantifies the extent to which different features in a dataset make subjects in that dataset more identifiable. Using the code in the repository, one can calculate the Uniqueness Shapley value for feature corresponding to each subject or instance in the dataset, which can also be aggregated to answer questions about subpopulations of interest. For more details on the method and how to interpret the results, please see the paper:

Seiler, B., Mase, M., & Owen, A. B. "What makes you unique?," Electronic Journal of Statistics, Electron. J. Statist. 17(1), 1-18, (2023)

Install

Install the package locally with pip command.

git clone https://github.com/cohortshapley/uniquenessshapley
pip install -e uniquenessshapley

Prerequisites

This code is tested on:

  • Python 3.8.8
  • NumPy 1.20.1
  • Pandas 1.2.4
  • scipy 1.6.2
  • requests 2.25.1

For example notebooks, we need:

  • jupyter 1.0.0

Getting Started

See Jupyter notebook example here

Usages

This implementation as described in section 4 of the paper uses ADTrees and has a runtime linear in the number of rows, but exponential in the number of features.

Future Additions

We will be adding an approximate method for dealing with larger numbers of features.

Sources

The files ArrayRecord.py, IteratedTreeContingencyTable.py, and SparseADTree.py are from uraplutonium and used under their license. No changes have been made to these files except to include references to their source at the top.

The dataset.py script allows you to pull the data used for examples in the paper. The voter registration data from The North Carolina State Board of Elections and the solar flare data from UCI Machine Learning Repository.

Releases

No releases published

Packages

No packages published