diff --git a/website/sphinx_src/api.md b/website/sphinx_src/api.md index f89fed37..97ea5b64 100644 --- a/website/sphinx_src/api.md +++ b/website/sphinx_src/api.md @@ -808,9 +808,10 @@ _See the [examples](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples) The full set of `LitType`s is defined in [types.py](https://github.com/PAIR-code/lit/blob/main/lit_nlp/api/types.py). Numeric types such as `Integer` and `Scalar` have predefined ranges that can be overridden -using corresponding `min_val` and `max_val` attributes as seen -[here](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/data.py;l=19-22;rcl=639554825). -The different types available in LIT are summarized in the table below. +using corresponding `min_val` and `max_val` attributes as seen in +[penguin data](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/data.py) +`INPUT_SPEC`. The different types available in LIT are summarized in the table +below. Note: Bracket syntax, such as `[num_tokens]`, refers to the shapes of NumPy arrays where each element inside the brackets is an integer. diff --git a/website/sphinx_src/components.md b/website/sphinx_src/components.md index 9b9189d1..90b99dbd 100644 --- a/website/sphinx_src/components.md +++ b/website/sphinx_src/components.md @@ -256,7 +256,7 @@ regression (`RegressionScore`) and generation (`GeneratedText` or ### Gradient Norm This is a simple method, in which salience scores are proportional to the L2 -norm of the gradient, i.e. the score for token $$i$$ is: +norm of the gradient, i.e. the score for token $i$ is: $$S(i) \propto ||\nabla_{x_i} \hat{y}||_2$$ @@ -268,25 +268,25 @@ To enable this method, your model should, as part of the * Return a `TokenGradients` field with the `align` attribute pointing to the name of the `Tokens` field (i.e. `align="tokens"`). Values should be arrays of shape `[num_tokens, emb_dim]` representing the gradient - $$\nabla_{x} \hat{y}$$ of the embeddings with respect to the prediction - $$\hat{y}$$. + $\nabla_{x} \hat{y}$ of the embeddings with respect to the prediction + $\hat{y}$. Because LIT is framework-agnostic, the model code is responsible for performing the gradient computation and returning the result as a NumPy array. The choice -of $$\hat{y}$$ is up to the developer; typically for regression/scoring this is +of $\hat{y}$ is up to the developer; typically for regression/scoring this is the raw score and for classification this is the score of the predicted (argmax) class. ### Gradient-dot-Input In this method, salience scores are proportional to the dot product of the input -embeddings and their gradients, i.e. for token $$i$$ we compute: +embeddings and their gradients, i.e. for token $i$ we compute: $$S(i) \propto x_i \cdot \nabla_{x_i} \hat{y}$$ Compared to grad-norm, this gives directional scores: a positive score is can be interpreted as that token having a positive influence on the prediction -$$\hat{y}$$, while a negative score suggests that the prediction would be +$\hat{y}$, while a negative score suggests that the prediction would be stronger if that token was removed. To enable this method, your model should, as part of the @@ -295,13 +295,13 @@ To enable this method, your model should, as part of the * Return a `Tokens` field with values (as `list[str]`) containing the tokenized input. * Return a `TokenEmbeddings` field with values as arrays of shape - `[num_tokens, emb_dim]` containing the input embeddings $$x$$. + `[num_tokens, emb_dim]` containing the input embeddings $x$. * Return a `TokenGradients` field with the `align` attribute pointing to the name of the `Tokens` field (i.e. `align="tokens"`), and the `grad_for` attribute pointing to the name of the `TokenEmbeddings` field. Values should be arrays of shape `[num_tokens, emb_dim]` representing the gradient - $$\nabla_{x} \hat{y}$$ of the embeddings with respect to the prediction - $$\hat{y}$$. + $\nabla_{x} \hat{y}$ of the embeddings with respect to the prediction + $\hat{y}$. As with grad-norm, the model should return embeddings and gradients as NumPy arrays. The LIT `GradientDotInput` component will compute the dot products and diff --git a/website/sphinx_src/docker.md b/website/sphinx_src/docker.md index b58279c6..ebf7eb7c 100644 --- a/website/sphinx_src/docker.md +++ b/website/sphinx_src/docker.md @@ -13,10 +13,8 @@ LIT can be run as a containerized app using [Docker](https://www.docker.com/) or your preferred engine. This is how we run our [hosted demos](https://pair-code.github.io/lit/demos/). -We provide a basic -[`Dockerfile`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/Dockerfile) that you can -use to build and run any of the demos in the `lit_nlp/examples` directory. The -`Dockerfile` installs all necessary dependencies for LIT and builds the +We provide a basic Dockerfile https://github.com/PAIR-code/lit/blob/main/Dockerfile that you can use to build and run any of the demos in the `lit_nlp/examples` directory. +The `Dockerfile` installs all necessary dependencies for LIT and builds the front-end code from source. Then it runs [gunicorn](https://gunicorn.org/) as the HTTP server, invoking the `get_wsgi_app()` method from our demo file to get the WSGI app to serve. The options provided to gunicorn for our use-case can be @@ -26,7 +24,7 @@ You can find a reference implementation in [`glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py). Use the following shell -[.github/workflows/ci.ymlcommands](https://github.com/PAIR-code/lit/blob/main/lit_nlp/.github/workflows/ci.ymlcommands) to build the +https://github.com/PAIR-code/lit/blob/main/.github/workflows/ci.yml commands to build the default Docker image for LIT from the provided `Dockerfile`, and then run a container from that image. Comments are provided in-line to help explain what each step does. @@ -72,7 +70,7 @@ docker run -d -p 2345:2345 -e DEMO_NAME=tydi -e DEMO_PORT=2345 lit-nlp Many LIT users create their own custom LIT server script to demo or serve, which involves creating an executable Python module with a `main()` method, as -described in the [Python API docs](https://pair-code.github.io/lit/documentation/api.md#adding-models-and-data). +described in the [Python API docs](api.md#adding-models-and-data). These custom server scripts can be easily integrated with LIT's default image as long as your server meets two requirements: diff --git a/website/sphinx_src/images/lit-s2s-journey.png b/website/sphinx_src/images/lit-s2s-journey.png deleted file mode 100644 index 7e0728c6..00000000 Binary files a/website/sphinx_src/images/lit-s2s-journey.png and /dev/null differ diff --git a/website/sphinx_src/ui_guide.md b/website/sphinx_src/ui_guide.md index 9014ce02..bb5028ae 100644 --- a/website/sphinx_src/ui_guide.md +++ b/website/sphinx_src/ui_guide.md @@ -513,27 +513,11 @@ model. ![Sentiment analysis](./images/lit-sentiment-analysis.png "Sentiment analysis") -### Debugging Text Generation - - - -Does the training data explain a particular error in text generation? We analyze -an older T5 model on the CNN-DM summarization task. LIT’s *Scalars* module -allows us to look at per-example ROUGE scores, and quickly select an example -with middling performance (screenshot section (a)). We find the generated text -(screenshot section (b)) contains an erroneous constituent: “alastair cook was -replaced as captain by former captain ...”. We can dig deeper, using LIT’s -language modeling module (screenshot section (c)) to see that the token “by” is -predicted with high probability (28.7%). - -To find out how T5 arrived at this prediction, we utilize the “similarity -searcher” component through the datapoint generator (screenshot section (d)). -This performs a fast approximate nearest-neighbor lookup from a pre-built index -over the training corpus, using embeddings from the T5 decoder. With one click, -we can retrieve 25 nearest neighbors and add them to the LIT UI for inspection. -We see that the words “captain” and “former” appear 34 and 16 times in these -examples–along with 3 occurrences of “replaced by” (screenshot section (e)), -suggesting a strong prior toward our erroneous phrase. - -![LIT sequence-to-sequence analysis](./images/lit-s2s-journey.png "LIT sequence-to-sequence analysis"){w=500px align=center} +### Sequence salience + +Sequence salience generalizes token-based salience to text-to-text models, +allowing you to explain the impact of the prompt tokens on parts of the model +output. + +Check out [here](components.md#sequence-salience) for more details on how to +navigate the Sequence Salience UI module. diff --git a/website/src/demos.md b/website/src/demos.md index e26af521..42568f92 100644 --- a/website/src/demos.md +++ b/website/src/demos.md @@ -19,14 +19,6 @@ color: "#49596c" tags: "tabular, binary classification", external:"true" %} - {% include partials/demo-card, - c-title: "Image classification", - link: "/demos/images.html", - c-data-source: "Imagenette", - c-copy: "Analyze an image classification model with LIT, including multiple image salience techniques.", - tags: "images, multiclass classification", - external:"true" %} - {% include partials/demo-card, c-title: "Classification and regression models", link: "/demos/glue.html", @@ -42,37 +34,5 @@ color: "#49596c" c-copy: "Use LIT directly inside a Colab notebook. Explore binary classification for sentiment analysis using SST2 from the General Language Understanding Evaluation (GLUE) benchmark suite.", tags: "BERT, binary classification, notebooks", external:"true" %} - - {% include partials/demo-card, - c-title: "Gender bias in coreference systems", - link: "/demos/coref.html", - c-data-source: "Winogender schemas", - c-copy: "Use LIT to explore gendered associations in a coreference system, which matches pronouns to their antecedents. This demo highlights how LIT can work with structured prediction models (edge classification), and its capability for disaggregated analysis.", - tags: "BERT, coreference, fairness, Winogender", - external:"true" %} - - {% include partials/demo-card, - c-title: "Fill in the blanks", - link: "/demos/lm.html", - c-data-source: "Stanford Sentiment Treebank, Movie Reviews", - c-copy: "Explore a BERT-based masked-language model. See what tokens the model predicts should fill in the blank when any token from an example sentence is masked out.", - tags: "BERT, masked language model", - external:"true" %} - - {% include partials/demo-card, - c-title: "Text generation", - link: "/demos/t5.html", - c-data-source: "CNN / Daily Mail", - c-copy: "Use a T5 model to summarize text. For any example of interest, quickly find similar examples from the training set, using an approximate nearest-neighbors index.", - tags: "T5, generation", - external:"true" %} - - {% include partials/demo-card, - c-title: "Evaluating input salience methods", - link: "/demos/is_eval.html", - c-data-source: "Stanford Sentiment Treebank, Toxicity", - c-copy: "Explore the faithfulness of input salience methods on a BERT-base model across different datasets and artificial shortcuts.", - tags: "BERT, salience, evaluation", - external:"true" %} diff --git a/website/src/index.md b/website/src/index.md index bea47e24..690c463a 100644 --- a/website/src/index.md +++ b/website/src/index.md @@ -44,8 +44,6 @@ LIT can be run as a standalone server, or inside of python notebook environments Salience maps -Attention visualization - Metrics calculations Counterfactual generation @@ -105,7 +103,7 @@ And more...
{% include partials/home-card image: '/assets/images/LIT_Updates.png', action: 'UPDATES', - title: 'Version 1.1', + title: 'Version 1.2', desc: 'Input salience for text-to-text LLMs, with wrappers for HuggingFace Transformers and KerasNLP models.', cta-text:"See release notes", link: 'https://github.com/PAIR-code/lit/blob/main/RELEASE.md' diff --git a/website/src/tutorials/sentiment.md b/website/src/tutorials/sentiment.md index 16dd5dfc..d706e557 100644 --- a/website/src/tutorials/sentiment.md +++ b/website/src/tutorials/sentiment.md @@ -18,7 +18,7 @@ takeaways: "Learn about how the metrics table and saliency maps assisted an anal {% include partials/link-out link: "../../demos/glue.html", text: "Explore this demo yourself." %} -Or, run your own with [`examples/glue_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py) +Or, run your own with [`examples/glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py) How well does a sentiment classifier handle negation? We can use LIT to interactively ask this question and get answers. We loaded up LIT the development set of the Stanford Sentiment Treebank (SST), which contains sentences from movie reviews that have been human-labeled as having a negative sentiment (0), or a positive sentiment (1). For a model, we are using a BERT-based binary classifier that has been trained to classify sentiment. diff --git a/website/src/tutorials/sequence-salience.md b/website/src/tutorials/sequence-salience.md index 29a99d34..3f5ad91b 100644 --- a/website/src/tutorials/sequence-salience.md +++ b/website/src/tutorials/sequence-salience.md @@ -20,7 +20,7 @@ takeaways: "Learn to use LIT's Sequence Salience module for prompt debugging." link: "https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lit_gemma.ipynb", text: "Follow along in Google Colab." %} -Or, run this locally with [`examples/lm_salience_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/lm_salience_demo.py) +Or, run this locally with [`examples/prompt_debugging/server.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/server.py) Large language models (LLMs), such as [Gemini][gemini] and [GPT-4][gpt4], have become ubiquitous. Recent releases of "open weights" models, including @@ -470,9 +470,9 @@ helpful guides that can help you develop better prompts, including: [howitworks_icl]: https://par.nsf.gov/servlets/purl/10462310 [lit_1_1_release_notes]:https://github.com/PAIR-code/lit/blob/main/RELEASE.md#release-11 [lit_colab]: https://colab.research.google.com/github/google/generative-ai-docs/blob/main/site/en/gemma/docs/lit_gemma.ipynb -[lit_hf]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/models/pretrained_lms.py +[lit_hf]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/transformers_lms.py [lit_issues]: https://github.com/PAIR-code/lit/issues -[lit_keras]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/models/instrumented_keras_lms.py +[lit_keras]: https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/prompt_debugging/keras_lms.py [lit_sxs]: ../../documentation/ui_guide.html#comparing-datapoints [llama]: https://llama.meta.com/ [main_toolbar]: ../../documentation/ui_guide.html#main-toolbar diff --git a/website/src/tutorials/tab-feat-attr.md b/website/src/tutorials/tab-feat-attr.md index 8a36c5ab..709b72d4 100644 --- a/website/src/tutorials/tab-feat-attr.md +++ b/website/src/tutorials/tab-feat-attr.md @@ -21,7 +21,7 @@ takeaways: "Learn how to use the Kernel SHAP based Tabular Feature Attribution m text: "Explore this demo yourself." %} Or, run your own with -[`examples/penguin_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin_demo.py) +[`examples/penguin/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/penguin/demo.py) LIT supports many techniques like salience maps and counterfactual generators for text data. But what if you have a tabular dataset? You might want to find diff --git a/website/src/tutorials/text-salience.md b/website/src/tutorials/text-salience.md index 598a1a8b..ecc6c10e 100644 --- a/website/src/tutorials/text-salience.md +++ b/website/src/tutorials/text-salience.md @@ -20,7 +20,7 @@ takeaways: "Learn how to use salience maps for text data in LIT." link: "../../demos/glue.html", text: "Explore this demo yourself." %} -Or, run your own with [`examples/glue_demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue_demo.py) +Or, run your own with [`examples/glue/demo.py`](https://github.com/PAIR-code/lit/blob/main/lit_nlp/examples/glue/demo.py) LIT enables users to analyze individual predictions for text input using salience maps, for which gradient-based and/or blackbox methods are available.