Skip to content

Commit

Permalink
More LIT documentation updates.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 646649859
  • Loading branch information
bdu91 authored and LIT team committed Jun 25, 2024
1 parent 48b029c commit 2e9d267
Show file tree
Hide file tree
Showing 11 changed files with 32 additions and 91 deletions.
7 changes: 4 additions & 3 deletions website/sphinx_src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -808,9 +808,10 @@ _See the [examples](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples)
The full set of `LitType`s is defined in
[types.py](https://github.com/PAIR-code/lit/blob/main/lit_nlp/api/types.py). Numeric types
such as `Integer` and `Scalar` have predefined ranges that can be overridden
using corresponding `min_val` and `max_val` attributes as seen
[here](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/data.py;l=19-22;rcl=639554825).
The different types available in LIT are summarized in the table below.
using corresponding `min_val` and `max_val` attributes as seen in
[penguin data](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/data.py)
`INPUT_SPEC`. The different types available in LIT are summarized in the table
below.

Note: Bracket syntax, such as `<float>[num_tokens]`, refers to the shapes of
NumPy arrays where each element inside the brackets is an integer.
Expand Down
18 changes: 9 additions & 9 deletions website/sphinx_src/components.md
Original file line number Diff line number Diff line change
Expand Up @@ -256,7 +256,7 @@ regression (`RegressionScore`) and generation (`GeneratedText` or
### Gradient Norm

This is a simple method, in which salience scores are proportional to the L2
norm of the gradient, i.e. the score for token $$i$$ is:
norm of the gradient, i.e. the score for token $i$ is:

$$S(i) \propto ||\nabla_{x_i} \hat{y}||_2$$

Expand All @@ -268,25 +268,25 @@ To enable this method, your model should, as part of the
* Return a `TokenGradients` field with the `align` attribute pointing to the
name of the `Tokens` field (i.e. `align="tokens"`). Values should be arrays
of shape `<float>[num_tokens, emb_dim]` representing the gradient
$$\nabla_{x} \hat{y}$$ of the embeddings with respect to the prediction
$$\hat{y}$$.
$\nabla_{x} \hat{y}$ of the embeddings with respect to the prediction
$\hat{y}$.

Because LIT is framework-agnostic, the model code is responsible for performing
the gradient computation and returning the result as a NumPy array. The choice
of $$\hat{y}$$ is up to the developer; typically for regression/scoring this is
of $\hat{y}$ is up to the developer; typically for regression/scoring this is
the raw score and for classification this is the score of the predicted (argmax)
class.

### Gradient-dot-Input

In this method, salience scores are proportional to the dot product of the input
embeddings and their gradients, i.e. for token $$i$$ we compute:
embeddings and their gradients, i.e. for token $i$ we compute:

$$S(i) \propto x_i \cdot \nabla_{x_i} \hat{y}$$

Compared to grad-norm, this gives directional scores: a positive score is can be
interpreted as that token having a positive influence on the prediction
$$\hat{y}$$, while a negative score suggests that the prediction would be
$\hat{y}$, while a negative score suggests that the prediction would be
stronger if that token was removed.

To enable this method, your model should, as part of the
Expand All @@ -295,13 +295,13 @@ To enable this method, your model should, as part of the
* Return a `Tokens` field with values (as `list[str]`) containing the
tokenized input.
* Return a `TokenEmbeddings` field with values as arrays of shape
`<float>[num_tokens, emb_dim]` containing the input embeddings $$x$$.
`<float>[num_tokens, emb_dim]` containing the input embeddings $x$.
* Return a `TokenGradients` field with the `align` attribute pointing to the
name of the `Tokens` field (i.e. `align="tokens"`), and the `grad_for`
attribute pointing to the name of the `TokenEmbeddings` field. Values should
be arrays of shape `<float>[num_tokens, emb_dim]` representing the gradient
$$\nabla_{x} \hat{y}$$ of the embeddings with respect to the prediction
$$\hat{y}$$.
$\nabla_{x} \hat{y}$ of the embeddings with respect to the prediction
$\hat{y}$.

As with grad-norm, the model should return embeddings and gradients as NumPy
arrays. The LIT `GradientDotInput` component will compute the dot products and
Expand Down
10 changes: 4 additions & 6 deletions website/sphinx_src/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,8 @@ LIT can be run as a containerized app using [Docker](https://www.docker.com/) or
your preferred engine. This is how we run our
[hosted demos](https://pair-code.github.io/lit/demos/).

We provide a basic
[`Dockerfile`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/Dockerfile) that you can
use to build and run any of the demos in the `lit_nlp/examples` directory. The
`Dockerfile` installs all necessary dependencies for LIT and builds the
We provide a basic Dockerfile https://github.com/PAIR-code/lit/blob/main/Dockerfile that you can use to build and run any of the demos in the `lit_nlp/examples` directory.
The `Dockerfile` installs all necessary dependencies for LIT and builds the
front-end code from source. Then it runs [gunicorn](https://gunicorn.org/) as
the HTTP server, invoking the `get_wsgi_app()` method from our demo file to get
the WSGI app to serve. The options provided to gunicorn for our use-case can be
Expand All @@ -26,7 +24,7 @@ You can find a reference implementation in
[`glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py).

Use the following shell
[.github/workflows/ci.ymlcommands](https://github.com/PAIR-code/lit/blob/main/lit_nlp/.github/workflows/ci.ymlcommands) to build the
https://github.com/PAIR-code/lit/blob/main/.github/workflows/ci.yml commands to build the
default Docker image for LIT from the provided `Dockerfile`, and then run a
container from that image. Comments are provided in-line to help explain what
each step does.
Expand Down Expand Up @@ -72,7 +70,7 @@ docker run -d -p 2345:2345 -e DEMO_NAME=tydi -e DEMO_PORT=2345 lit-nlp

Many LIT users create their own custom LIT server script to demo or serve, which
involves creating an executable Python module with a `main()` method, as
described in the [Python API docs](https://pair-code.github.io/lit/documentation/api.md#adding-models-and-data).
described in the [Python API docs](api.md#adding-models-and-data).

These custom server scripts can be easily integrated with LIT's default image as
long as your server meets two requirements:
Expand Down
Binary file removed website/sphinx_src/images/lit-s2s-journey.png
Binary file not shown.
32 changes: 8 additions & 24 deletions website/sphinx_src/ui_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -513,27 +513,11 @@ model.

![Sentiment analysis](./images/lit-sentiment-analysis.png "Sentiment analysis")

### Debugging Text Generation

<!-- TODO(lit-dev): T5 no longer makes the mistake documented below. Find a
example that fits better with the text generation debugging story -->

Does the training data explain a particular error in text generation? We analyze
an older T5 model on the CNN-DM summarization task. LIT’s *Scalars* module
allows us to look at per-example ROUGE scores, and quickly select an example
with middling performance (screenshot section (a)). We find the generated text
(screenshot section (b)) contains an erroneous constituent: “alastair cook was
replaced as captain by former captain ...”. We can dig deeper, using LIT’s
language modeling module (screenshot section (c)) to see that the token “by” is
predicted with high probability (28.7%).

To find out how T5 arrived at this prediction, we utilize the “similarity
searcher” component through the datapoint generator (screenshot section (d)).
This performs a fast approximate nearest-neighbor lookup from a pre-built index
over the training corpus, using embeddings from the T5 decoder. With one click,
we can retrieve 25 nearest neighbors and add them to the LIT UI for inspection.
We see that the words “captain” and “former” appear 34 and 16 times in these
examples–along with 3 occurrences of “replaced by” (screenshot section (e)),
suggesting a strong prior toward our erroneous phrase.

![LIT sequence-to-sequence analysis](./images/lit-s2s-journey.png "LIT sequence-to-sequence analysis"){w=500px align=center}
### Sequence salience

Sequence salience generalizes token-based salience to text-to-text models,
allowing you to explain the impact of the prompt tokens on parts of the model
output.

Check out [here](components.md#sequence-salience) for more details on how to
navigate the Sequence Salience UI module.
40 changes: 0 additions & 40 deletions website/src/demos.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,6 @@ color: "#49596c"
tags: "tabular, binary classification",
external:"true" %}

{% include partials/demo-card,
c-title: "Image classification",
link: "/demos/images.html",
c-data-source: "Imagenette",
c-copy: "Analyze an image classification model with LIT, including multiple image salience techniques.",
tags: "images, multiclass classification",
external:"true" %}

{% include partials/demo-card,
c-title: "Classification and regression models",
link: "/demos/glue.html",
Expand All @@ -42,37 +34,5 @@ color: "#49596c"
c-copy: "Use LIT directly inside a Colab notebook. Explore binary classification for sentiment analysis using SST2 from the General Language Understanding Evaluation (GLUE) benchmark suite.",
tags: "BERT, binary classification, notebooks",
external:"true" %}

{% include partials/demo-card,
c-title: "Gender bias in coreference systems",
link: "/demos/coref.html",
c-data-source: "Winogender schemas",
c-copy: "Use LIT to explore gendered associations in a coreference system, which matches pronouns to their antecedents. This demo highlights how LIT can work with structured prediction models (edge classification), and its capability for disaggregated analysis.",
tags: "BERT, coreference, fairness, Winogender",
external:"true" %}

{% include partials/demo-card,
c-title: "Fill in the blanks",
link: "/demos/lm.html",
c-data-source: "Stanford Sentiment Treebank, Movie Reviews",
c-copy: "Explore a BERT-based masked-language model. See what tokens the model predicts should fill in the blank when any token from an example sentence is masked out.",
tags: "BERT, masked language model",
external:"true" %}

{% include partials/demo-card,
c-title: "Text generation",
link: "/demos/t5.html",
c-data-source: "CNN / Daily Mail",
c-copy: "Use a T5 model to summarize text. For any example of interest, quickly find similar examples from the training set, using an approximate nearest-neighbors index.",
tags: "T5, generation",
external:"true" %}

{% include partials/demo-card,
c-title: "Evaluating input salience methods",
link: "/demos/is_eval.html",
c-data-source: "Stanford Sentiment Treebank, Toxicity",
c-copy: "Explore the faithfulness of input salience methods on a BERT-base model across different datasets and artificial shortcuts.",
tags: "BERT, salience, evaluation",
external:"true" %}
</div>
</div>
4 changes: 1 addition & 3 deletions website/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,6 @@ LIT can be run as a standalone server, or inside of python notebook environments

Salience maps

Attention visualization

Metrics calculations

Counterfactual generation
Expand Down Expand Up @@ -105,7 +103,7 @@ And more...
<div class="mdl-grid no-padding">
{% include partials/home-card image: '/assets/images/LIT_Updates.png',
action: 'UPDATES',
title: 'Version 1.1',
title: 'Version 1.2',
desc: 'Input salience for text-to-text LLMs, with wrappers for HuggingFace Transformers and KerasNLP models.',
cta-text:"See release notes",
link: 'https://github.com/PAIR-code/lit/blob/main/RELEASE.md'
Expand Down
2 changes: 1 addition & 1 deletion website/src/tutorials/sentiment.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ takeaways: "Learn about how the metrics table and saliency maps assisted an anal

{% include partials/link-out link: "../../demos/glue.html", text: "Explore this demo yourself." %}

Or, run your own with [`examples/glue_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py)
Or, run your own with [`examples/glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py)

How well does a sentiment classifier handle negation? We can use LIT to interactively ask this question and get answers. We loaded up LIT the development set of the Stanford Sentiment Treebank (SST), which contains sentences from movie reviews that have been human-labeled as having a negative sentiment (0), or a positive sentiment (1). For a model, we are using a BERT-based binary classifier that has been trained to classify sentiment.

Expand Down
6 changes: 3 additions & 3 deletions website/src/tutorials/sequence-salience.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ takeaways: "Learn to use LIT's Sequence Salience module for prompt debugging."
link: "https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lit_gemma.ipynb",
text: "Follow along in Google Colab." %}

Or, run this locally with [`examples/lm_salience_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/lm_salience_demo.py)
Or, run this locally with [`examples/prompt_debugging/server.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/server.py)

Large language models (LLMs), such as [Gemini][gemini] and [GPT-4][gpt4], have
become ubiquitous. Recent releases of "open weights" models, including
Expand Down Expand Up @@ -470,9 +470,9 @@ helpful guides that can help you develop better prompts, including:
[howitworks_icl]: https://par.nsf.gov/servlets/purl/10462310
[lit_1_1_release_notes]:https://github.com/PAIR-code/lit/blob/main/RELEASE.md#release-11
[lit_colab]: https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lit_gemma.ipynb
[lit_hf]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/models/pretrained_lms.py
[lit_hf]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/transformers_lms.py
[lit_issues]: https://github.com/PAIR-code/lit/issues
[lit_keras]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/models/instrumented_keras_lms.py
[lit_keras]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/keras_lms.py
[lit_sxs]: ../../documentation/ui_guide.html#comparing-datapoints
[llama]: https://llama.meta.com/
[main_toolbar]: ../../documentation/ui_guide.html#main-toolbar
Expand Down
2 changes: 1 addition & 1 deletion website/src/tutorials/tab-feat-attr.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ takeaways: "Learn how to use the Kernel SHAP based Tabular Feature Attribution m
text: "Explore this demo yourself." %}

Or, run your own with
[`examples/penguin_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin_demo.py)
[`examples/penguin/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/demo.py)

LIT supports many techniques like salience maps and counterfactual generators
for text data. But what if you have a tabular dataset? You might want to find
Expand Down
2 changes: 1 addition & 1 deletion website/src/tutorials/text-salience.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ takeaways: "Learn how to use salience maps for text data in LIT."
link: "../../demos/glue.html",
text: "Explore this demo yourself." %}

Or, run your own with [`examples/glue_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py)
Or, run your own with [`examples/glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py)

LIT enables users to analyze individual predictions for text input using
salience maps, for which gradient-based and/or blackbox methods are available.
Expand Down

0 comments on commit 2e9d267

Please sign in to comment.