More LIT documentation updates.

PiperOrigin-RevId: 646649859
PAIR-code · Jun 25, 2024 · 2e9d267 · 2e9d267
1 parent 48b029c
commit 2e9d267
Show file tree

Hide file tree

Showing 11 changed files with 32 additions and 91 deletions.
diff --git a/website/sphinx_src/api.md b/website/sphinx_src/api.md
@@ -808,9 +808,10 @@ _See the [examples](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples)
 The full set of `LitType`s is defined in
 [types.py](https://github.com/PAIR-code/lit/blob/main/lit_nlp/api/types.py). Numeric types
 such as `Integer` and `Scalar` have predefined ranges that can be overridden
-using corresponding `min_val` and `max_val` attributes as seen
-[here](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/data.py;l=19-22;rcl=639554825).
-The different types available in LIT are summarized in the table below.
+using corresponding `min_val` and `max_val` attributes as seen in
+[penguin data](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/data.py)
+`INPUT_SPEC`. The different types available in LIT are summarized in the table
+below.
 
 Note: Bracket syntax, such as `<float>[num_tokens]`, refers to the shapes of
 NumPy arrays where each element inside the brackets is an integer.

diff --git a/website/sphinx_src/components.md b/website/sphinx_src/components.md
@@ -256,7 +256,7 @@ regression (`RegressionScore`) and generation (`GeneratedText` or
 ### Gradient Norm
 
 This is a simple method, in which salience scores are proportional to the L2
-norm of the gradient, i.e. the score for token $$i$$ is:
+norm of the gradient, i.e. the score for token $i$ is:
 
 $$S(i) \propto ||\nabla_{x_i} \hat{y}||_2$$
 
@@ -268,25 +268,25 @@ To enable this method, your model should, as part of the
 *   Return a `TokenGradients` field with the `align` attribute pointing to the
     name of the `Tokens` field (i.e. `align="tokens"`). Values should be arrays
     of shape `<float>[num_tokens, emb_dim]` representing the gradient
-    $$\nabla_{x} \hat{y}$$ of the embeddings with respect to the prediction
-    $$\hat{y}$$.
+    $\nabla_{x} \hat{y}$ of the embeddings with respect to the prediction
+    $\hat{y}$.
 
 Because LIT is framework-agnostic, the model code is responsible for performing
 the gradient computation and returning the result as a NumPy array. The choice
-of $$\hat{y}$$ is up to the developer; typically for regression/scoring this is
+of $\hat{y}$ is up to the developer; typically for regression/scoring this is
 the raw score and for classification this is the score of the predicted (argmax)
 class.
 
 ### Gradient-dot-Input
 
 In this method, salience scores are proportional to the dot product of the input
-embeddings and their gradients, i.e. for token $$i$$ we compute:
+embeddings and their gradients, i.e. for token $i$ we compute:
 
 $$S(i) \propto x_i \cdot \nabla_{x_i} \hat{y}$$
 
 Compared to grad-norm, this gives directional scores: a positive score is can be
 interpreted as that token having a positive influence on the prediction
-$$\hat{y}$$, while a negative score suggests that the prediction would be
+$\hat{y}$, while a negative score suggests that the prediction would be
 stronger if that token was removed.
 
 To enable this method, your model should, as part of the
@@ -295,13 +295,13 @@ To enable this method, your model should, as part of the
 *   Return a `Tokens` field with values (as `list[str]`) containing the
     tokenized input.
 *   Return a `TokenEmbeddings` field with values as arrays of shape
-    `<float>[num_tokens, emb_dim]` containing the input embeddings $$x$$.
+    `<float>[num_tokens, emb_dim]` containing the input embeddings $x$.
 *   Return a `TokenGradients` field with the `align` attribute pointing to the
     name of the `Tokens` field (i.e. `align="tokens"`), and the `grad_for`
     attribute pointing to the name of the `TokenEmbeddings` field. Values should
     be arrays of shape `<float>[num_tokens, emb_dim]` representing the gradient
-    $$\nabla_{x} \hat{y}$$ of the embeddings with respect to the prediction
-    $$\hat{y}$$.
+    $\nabla_{x} \hat{y}$ of the embeddings with respect to the prediction
+    $\hat{y}$.
 
 As with grad-norm, the model should return embeddings and gradients as NumPy
 arrays. The LIT `GradientDotInput` component will compute the dot products and

diff --git a/website/sphinx_src/docker.md b/website/sphinx_src/docker.md
@@ -13,10 +13,8 @@ LIT can be run as a containerized app using [Docker](https://www.docker.com/) or
 your preferred engine. This is how we run our
 [hosted demos](https://pair-code.github.io/lit/demos/).
 
-We provide a basic
-[`Dockerfile`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/Dockerfile) that you can
-use to build and run any of the demos in the `lit_nlp/examples` directory. The
-`Dockerfile` installs all necessary dependencies for LIT and builds the
+We provide a basic Dockerfile https://github.com/PAIR-code/lit/blob/main/Dockerfile that you can use to build and run any of the demos in the `lit_nlp/examples` directory.
+The `Dockerfile` installs all necessary dependencies for LIT and builds the
 front-end code from source. Then it runs [gunicorn](https://gunicorn.org/) as
 the HTTP server, invoking the `get_wsgi_app()` method from our demo file to get
 the WSGI app to serve. The options provided to gunicorn for our use-case can be
@@ -26,7 +24,7 @@ You can find a reference implementation in
 [`glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py).
 
 Use the following shell
-[.github/workflows/ci.ymlcommands](https://github.com/PAIR-code/lit/blob/main/lit_nlp/.github/workflows/ci.ymlcommands) to build the
+https://github.com/PAIR-code/lit/blob/main/.github/workflows/ci.yml commands to build the
 default Docker image for LIT from the provided `Dockerfile`, and then run a
 container from that image. Comments are provided in-line to help explain what
 each step does.
@@ -72,7 +70,7 @@ docker run -d -p 2345:2345 -e DEMO_NAME=tydi -e DEMO_PORT=2345 lit-nlp
 
 Many LIT users create their own custom LIT server script to demo or serve, which
 involves creating an executable Python module with a `main()` method, as
-described in the [Python API docs](https://pair-code.github.io/lit/documentation/api.md#adding-models-and-data).
+described in the [Python API docs](api.md#adding-models-and-data).
 
 These custom server scripts can be easily integrated with LIT's default image as
 long as your server meets two requirements:

diff --git a/website/sphinx_src/images/lit-s2s-journey.png b/website/sphinx_src/images/lit-s2s-journey.png
diff --git a/website/sphinx_src/ui_guide.md b/website/sphinx_src/ui_guide.md
@@ -513,27 +513,11 @@ model.
 
 ![Sentiment analysis](./images/lit-sentiment-analysis.png "Sentiment analysis")
 
-### Debugging Text Generation
-
-<!-- TODO(lit-dev): T5 no longer makes the mistake documented below. Find a
-     example that fits better with the text generation debugging story -->
-
-Does the training data explain a particular error in text generation? We analyze
-an older T5 model on the CNN-DM summarization task. LIT’s *Scalars* module
-allows us to look at per-example ROUGE scores, and quickly select an example
-with middling performance (screenshot section (a)). We find the generated text
-(screenshot section (b)) contains an erroneous constituent: “alastair cook was
-replaced as captain by former captain ...”. We can dig deeper, using LIT’s
-language modeling module (screenshot section (c)) to see that the token “by” is
-predicted with high probability (28.7%).
-
-To find out how T5 arrived at this prediction, we utilize the “similarity
-searcher” component through the datapoint generator (screenshot section (d)).
-This performs a fast approximate nearest-neighbor lookup from a pre-built index
-over the training corpus, using embeddings from the T5 decoder. With one click,
-we can retrieve 25 nearest neighbors and add them to the LIT UI for inspection.
-We see that the words “captain” and “former” appear 34 and 16 times in these
-examples–along with 3 occurrences of “replaced by” (screenshot section (e)),
-suggesting a strong prior toward our erroneous phrase.
-
-![LIT sequence-to-sequence analysis](./images/lit-s2s-journey.png "LIT sequence-to-sequence analysis"){w=500px align=center}
+### Sequence salience
+
+Sequence salience generalizes token-based salience to text-to-text models,
+allowing you to explain the impact of the prompt tokens on parts of the model
+output.
+
+Check out [here](components.md#sequence-salience) for more details on how to
+navigate the Sequence Salience UI module.
diff --git a/website/src/demos.md b/website/src/demos.md
@@ -19,14 +19,6 @@ color: "#49596c"
       tags: "tabular, binary classification",
       external:"true" %}
 
-  {%  include partials/demo-card,
-      c-title: "Image classification",
-      link: "/demos/images.html",
-      c-data-source: "Imagenette",
-      c-copy: "Analyze an image classification model with LIT, including multiple image salience techniques.",
-      tags: "images, multiclass classification",
-      external:"true" %}
-
   {%  include partials/demo-card,
       c-title: "Classification and regression models",
       link: "/demos/glue.html",
@@ -42,37 +34,5 @@ color: "#49596c"
       c-copy: "Use LIT directly inside a Colab notebook. Explore binary classification for sentiment analysis using SST2 from the General Language Understanding Evaluation (GLUE) benchmark suite.",
       tags: "BERT, binary classification, notebooks",
       external:"true" %}
-
-  {%  include partials/demo-card,
-      c-title: "Gender bias in coreference systems",
-      link: "/demos/coref.html",
-      c-data-source: "Winogender schemas",
-      c-copy: "Use LIT to explore gendered associations in a coreference system, which matches pronouns to their antecedents. This demo highlights how LIT can work with structured prediction models (edge classification), and its capability for disaggregated analysis.",
-      tags: "BERT, coreference, fairness, Winogender",
-      external:"true" %}
-
-  {%  include partials/demo-card,
-      c-title: "Fill in the blanks",
-      link: "/demos/lm.html",
-      c-data-source: "Stanford Sentiment Treebank, Movie Reviews",
-      c-copy: "Explore a BERT-based masked-language model. See what tokens the model predicts should fill in the blank when any token from an example sentence is masked out.",
-      tags: "BERT, masked language model",
-      external:"true" %}
-
-  {%  include partials/demo-card,
-      c-title: "Text generation",
-      link: "/demos/t5.html",
-      c-data-source: "CNN / Daily Mail",
-      c-copy: "Use a T5 model to summarize text. For any example of interest, quickly find similar examples from the training set, using an approximate nearest-neighbors index.",
-      tags: "T5, generation",
-      external:"true" %}
-
-  {%  include partials/demo-card,
-      c-title: "Evaluating input salience methods",
-      link: "/demos/is_eval.html",
-      c-data-source: "Stanford Sentiment Treebank, Toxicity",
-      c-copy: "Explore the faithfulness of input salience methods on a BERT-base model across different datasets and artificial shortcuts.",
-      tags: "BERT, salience, evaluation",
-      external:"true" %}
   </div>
 </div>
diff --git a/website/src/index.md b/website/src/index.md
@@ -44,8 +44,6 @@ LIT can be run as a standalone server, or inside of python notebook environments
 
 Salience maps
 
-Attention visualization
-
 Metrics calculations
 
 Counterfactual generation
@@ -105,7 +103,7 @@ And more...
 <div class="mdl-grid no-padding">
   {% include partials/home-card image: '/assets/images/LIT_Updates.png',
       action: 'UPDATES',
-      title: 'Version 1.1',
+      title: 'Version 1.2',
       desc: 'Input salience for text-to-text LLMs, with wrappers for HuggingFace Transformers and KerasNLP models.',
       cta-text:"See release notes",
       link: 'https://github.com/PAIR-code/lit/blob/main/RELEASE.md'

diff --git a/website/src/tutorials/sentiment.md b/website/src/tutorials/sentiment.md
@@ -18,7 +18,7 @@ takeaways: "Learn about how the metrics table and saliency maps assisted an anal
 
 {% include partials/link-out link: "../../demos/glue.html", text: "Explore this demo yourself." %}
 
-Or, run your own with [`examples/glue_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py)
+Or, run your own with [`examples/glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py)
 
 How well does a sentiment classifier handle negation? We can use LIT to interactively ask this question and get answers. We loaded up LIT the development set of the Stanford Sentiment Treebank (SST), which contains sentences from movie reviews that have been human-labeled as having a negative sentiment (0), or a positive sentiment (1). For a model, we are using a BERT-based binary classifier that has been trained to classify sentiment.
 

diff --git a/website/src/tutorials/sequence-salience.md b/website/src/tutorials/sequence-salience.md
@@ -20,7 +20,7 @@ takeaways: "Learn to use LIT's Sequence Salience module for prompt debugging."
     link: "https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lit_gemma.ipynb",
     text: "Follow along in Google Colab." %}
 
-Or, run this locally with [`examples/lm_salience_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/lm_salience_demo.py)
+Or, run this locally with [`examples/prompt_debugging/server.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/server.py)
 
 Large language models (LLMs), such as [Gemini][gemini] and [GPT-4][gpt4], have
 become ubiquitous. Recent releases of "open weights" models, including
@@ -470,9 +470,9 @@ helpful guides that can help you develop better prompts, including:
 [howitworks_icl]: https://par.nsf.gov/servlets/purl/10462310
 [lit_1_1_release_notes]:https://github.com/PAIR-code/lit/blob/main/RELEASE.md#release-11
 [lit_colab]: https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lit_gemma.ipynb
-[lit_hf]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/models/pretrained_lms.py
+[lit_hf]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/transformers_lms.py
 [lit_issues]: https://github.com/PAIR-code/lit/issues
-[lit_keras]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/models/instrumented_keras_lms.py
+[lit_keras]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/keras_lms.py
 [lit_sxs]: ../../documentation/ui_guide.html#comparing-datapoints
 [llama]: https://llama.meta.com/
 [main_toolbar]: ../../documentation/ui_guide.html#main-toolbar

diff --git a/website/src/tutorials/tab-feat-attr.md b/website/src/tutorials/tab-feat-attr.md
@@ -21,7 +21,7 @@ takeaways: "Learn how to use the Kernel SHAP based Tabular Feature Attribution m
     text: "Explore this demo yourself." %}
 
 Or, run your own with
-[`examples/penguin_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin_demo.py)
+[`examples/penguin/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/demo.py)
 
 LIT supports many techniques like salience maps and counterfactual generators
 for text data. But what if you have a tabular dataset? You might want to find

diff --git a/website/src/tutorials/text-salience.md b/website/src/tutorials/text-salience.md
@@ -20,7 +20,7 @@ takeaways: "Learn how to use salience maps for text data in LIT."
     link: "../../demos/glue.html",
     text: "Explore this demo yourself." %}
 
-Or, run your own with [`examples/glue_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py)
+Or, run your own with [`examples/glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py)
 
 LIT enables users to analyze individual predictions for text input using
 salience maps, for which gradient-based and/or blackbox methods are available.