Merge pull request #47 from chanind/packaging

feat: Setting up Python packaging and autodeploy with Semantic Release
adamkarvonen · Jan 9, 2025 · e52a418 · e52a418
2 parents 9bbfdc5 + 9bc22a4
commit e52a418
Show file tree

Hide file tree

Showing 159 changed files with 6,025 additions and 48,893 deletions.
diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
@@ -0,0 +1,90 @@
+name: build
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11", "3.12"]
+
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Cache Huggingface assets
+        uses: actions/cache@v4
+        with:
+          key: huggingface-0-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('**/pyproject.toml') }}
+          path: ~/.cache/huggingface
+          restore-keys: |
+            huggingface-0-${{ runner.os }}-${{ matrix.python-version }}-
+      - name: Load cached Poetry installation
+        id: cached-poetry
+        uses: actions/cache@v4
+        with:
+          path: ~/.local # the path depends on the OS
+          key: poetry-${{ runner.os }}-${{ matrix.python-version }}-1 # increment to reset cache
+      - name: Install Poetry
+        if: steps.cached-poetry.outputs.cache-hit != 'true'
+        uses: snok/install-poetry@v1
+        with:
+          virtualenvs-create: true
+          virtualenvs-in-project: true
+          installer-parallel: true
+      - name: Load cached venv
+        id: cached-poetry-dependencies
+        uses: actions/cache@v4
+        with:
+          path: .venv
+          key: venv-0-${{ runner.os }}-${{ matrix.python-version }}-${{ hashFiles('**/pyproject.toml') }}
+          restore-keys: |
+            venv-0-${{ runner.os }}-${{ matrix.python-version }}-
+      - name: Install dependencies
+        if: steps.cached-poetry-dependencies.outputs.cache-hit != 'true'
+        run: poetry install --no-interaction
+      # TODO: Add linting, formatting, type checking to CI
+      - name: Run Unit Tests
+        run: poetry run pytest tests/unit
+
+  release:
+    needs: build
+    permissions:
+      contents: write
+      id-token: write
+    # https://github.jparrowsec.cnmunity/t/how-do-i-specify-job-dependency-running-in-another-workflow/16482
+    if: github.event_name == 'push' && github.ref == 'refs/heads/main' && !contains(github.event.head_commit.message, 'chore(release):')
+    runs-on: ubuntu-latest
+    concurrency: release
+    environment:
+      name: pypi
+
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.11"
+      - name: Semantic Release
+        id: release
+        uses: python-semantic-release/[email protected]
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+      - name: Publish package distributions to PyPI
+        uses: pypa/gh-action-pypi-publish@release/v1
+        if: steps.release.outputs.released == 'true'
+      - name: Publish package distributions to GitHub Releases
+        uses: python-semantic-release/upload-to-gh-release@main
+        if: steps.release.outputs.released == 'true'
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
diff --git a/.gitignore b/.gitignore
@@ -43,4 +43,5 @@ evals/ravel/models/
 **/eval_results/**
 *eval_results*
 **/dev/
-openai_api_key.txt
+openai_api_key.txt
+dist/
diff --git a/README.md b/README.md
@@ -1,6 +1,7 @@
 # SAE Bench
 
 ## Table of Contents
+
 - [Overview](#overview)
 - [Installation](#installation)
 - [Running Evaluations](#running-evaluations)
@@ -10,10 +11,10 @@
 
 CURRENT REPO STATUS: SAE Bench is currently a beta release. This repo is still under development as we clean up some of the rough edges left over from the research process. However, it is usable in the current state for both SAE Lens SAEs and custom SAEs.
 
-
 ## Overview
 
 SAE Bench is a comprehensive suite of 8 evaluations for Sparse Autoencoder (SAE) models:
+
 - **[Feature Absorption](https://arxiv.org/abs/2409.14507)**
 - **[AutoInterp](https://blog.eleuther.ai/autointerp/)**
 - **L0 / Loss Recovered**
@@ -26,11 +27,13 @@ SAE Bench is a comprehensive suite of 8 evaluations for Sparse Autoencoder (SAE)
 For more information, refer to our [blog post](https://www.neuronpedia.org/sae-bench/info).
 
 ### Supported Models and SAEs
+
 - **SAE Lens Pretrained SAEs**: Supports evaluations on any SAE Lens SAE.
 - **dictionary_learning SAES**: We support evaluations on any SAE trained with the [dictionary_learning repo](https://github.com/saprmarks/dictionary_learning) (see [Custom SAE Usage](#custom-sae-usage)).
 - **Custom SAEs**: Supports any general SAE object with `encode()` and `decode()` methods (see [Custom SAE Usage](#custom-sae-usage)).
 
 ### Installation
+
 Set up a virtual environment with python >= 3.10.
 
 ```
@@ -39,6 +42,12 @@ cd SAEBench
 pip install -e .
 ```
 
+Alternative, you can install from pypi:
+
+```
+pip install sae-bench
+```
+
 All evals can be ran with current batch sizes on Gemma-2-2B on a 24GB VRAM GPU (e.g. a RTX 3090). By default, some evals cache LLM activations, which can require up to 100 GB of disk space. However, this can be disabled.
 
 Autointerp requires the creation of `openai_api_key.txt`. Unlearning requires requesting access to the WMDP bio dataset (refer to `unlearning/README.md`).
@@ -48,10 +57,11 @@ Autointerp requires the creation of `openai_api_key.txt`. Unlearning requires re
 We recommend to get starting by going through the `sae_bench_demo.ipynb` notebook. In this notebook, we load both a custom SAE and an SAE Lens SAE, run both of them on multiple evaluations, and plot graphs of the results.
 
 ## Running Evaluations
+
 Each evaluation has an example command located in its respective `main.py` file. To run all evaluations on a selection of SAE Lens SAEs, refer to `shell_scripts/README.md`. Here's an example of how to run a sparse probing evaluation on a single SAE Bench Pythia-70M SAE:
 
 ```
-python evals/sparse_probing/main.py \
+python -m sae_bench.evals.sparse_probing.main \
     --sae_regex_pattern "sae_bench_pythia70m_sweep_standard_ctx128_0712" \
     --sae_block_pattern "blocks.4.hook_resid_post__trainer_10" \
     --model_name pythia-70m-deduped
@@ -73,7 +83,8 @@ If your SAEs are trained with the [dictionary_learning repo](https://github.com/
 
 There are two ways to evaluate custom SAEs:
 
-1. **Using Evaluation Templates**: 
+1. **Using Evaluation Templates**:
+
    - Use the secondary `if __name__ == "__main__"` block in each `main.py`
    - Results are saved in SAE Bench format for easy visualization
    - Compatible with provided plotting tools
@@ -91,4 +102,30 @@ You can deterministically replicate the training of our SAEs using scripts provi
 
 ## Graphing Results
 
-If evaluating your own SAEs, we recommend using the graphing cells in `sae_bench_demo.ipynb`. To replicate all SAE Bench plots, refer to `graphing.ipynb`. In this notebook, we download all SAE Bench data and create a variety of plots.
+If evaluating your own SAEs, we recommend using the graphing cells in `sae_bench_demo.ipynb`. To replicate all SAE Bench plots, refer to `graphing.ipynb`. In this notebook, we download all SAE Bench data and create a variety of plots.
+
+## Development
+
+This project uses [Poetry](https://python-poetry.org/) for dependency management and packaging.
+
+To install the development dependencies, run:
+
+```
+poetry install
+```
+
+Unit tests can be run with:
+
+```
+poetry run pytest tests/unit
+```
+
+These test will be run automatically on every PR in CI.
+
+There are also acceptance tests than can be run with:
+
+```
+poetry run pytest tests/acceptance
+```
+
+These tests are expensive and will not be run automatically in CI, but are worth running manually before large changes.