diff --git a/_freeze/docs/src/how_to_guides/llm/execute-results/md.json b/_freeze/docs/src/how_to_guides/llm/execute-results/md.json index e4ceef4..be0e1ac 100644 --- a/_freeze/docs/src/how_to_guides/llm/execute-results/md.json +++ b/_freeze/docs/src/how_to_guides/llm/execute-results/md.json @@ -1,7 +1,7 @@ { - "hash": "4cd8fe8848c80c5bac5854a245c2032f", + "hash": "12bba4600128d624236d219cf6307b1d", "result": { - "markdown": "---\ntitle: How to Build a Conformal Chatbot\n---\n\n\n\n\n\nLarge Language Models are all the buzz right now. They are used for a variety of tasks, including text classification, question answering, and text generation. In this tutorial, we will show how to conformalize a transformer language model for text classification. We will use the [Banking77](https://arxiv.org/abs/2003.04807) dataset [@casanueva2020efficient], which consists of 13,083 queries from 77 intents. On the model side, we will use the [DistilRoBERTa](https://huggingface.co/mrm8488/distilroberta-finetuned-banking77) model, which is a distilled version of [RoBERTa](https://arxiv.org/abs/1907.11692) [@liu2019roberta] finetuned on the Banking77 dataset.\n\n## Data\n\nThe data was downloaded from [HuggingFace](https://huggingface.co/datasets/PolyAI/banking77) 🤗 (HF) and split into a proper training, calibration, and test set. All that's left to do is to load the data and preprocess it. We add 1 to the labels to make them 1-indexed (sorry Pythonistas 😜)\n\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# Get labels:\ndf_labels = CSV.read(\"dev/artifacts/data/banking77/labels.csv\", DataFrame, drop=[1])\nlabels = df_labels[:,1]\n\n# Get data:\ndf_train = CSV.read(\"dev/artifacts/data/banking77/train.csv\", DataFrame, drop=[1])\ndf_cal = CSV.read(\"dev/artifacts/data/banking77/calibration.csv\", DataFrame, drop=[1])\ndf_full_train = vcat(df_train, df_cal)\ntrain_ratio = round(nrow(df_train)/nrow(df_full_train), digits=2)\ndf_test = CSV.read(\"dev/artifacts/data/banking77/test.csv\", DataFrame, drop=[1])\n\n# Preprocess data:\nqueries_train, y_train = collect(df_train.text), categorical(df_train.labels .+ 1)\nqueries_cal, y_cal = collect(df_cal.text), categorical(df_cal.labels .+ 1)\nqueries, y = collect(df_full_train.text), categorical(df_full_train.labels .+ 1)\nqueries_test, y_test = collect(df_test.text), categorical(df_test.labels .+ 1)\n```\n:::\n\n\n## HuggingFace Model\n\nThe model can be loaded from HF straight into our running Julia session using the [`Transformers.jl`](https://github.com/chengchingwen/Transformers.jl/tree/master) package. Below we load the tokenizer `tkr` and the model `mod`. The tokenizer is used to convert the text into a sequence of integers, which is then fed into the model. The model outputs a hidden state, which is then fed into a classifier to get the logits for each class. Finally, the logits are then passed through a softmax function to get the corresponding predicted probabilities. Below we run a few queries through the model to see how it performs.\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Load model from HF 🤗:\ntkr = hgf\"mrm8488/distilroberta-finetuned-banking77:tokenizer\"\nmod = hgf\"mrm8488/distilroberta-finetuned-banking77:ForSequenceClassification\"\n\n# Test model:\nquery = [\n \"What is the base of the exchange rates?\",\n \"Why is my card not working?\",\n \"My Apple Pay is not working, what should I do?\",\n]\na = encode(tkr, query)\nb = mod.model(a)\nc = mod.cls(b.hidden_state)\nd = softmax(c.logit)\n[labels[i] for i in Flux.onecold(d)]\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n3-element Vector{String}:\n \"exchange_rate\"\n \"card_not_working\"\n \"apple_pay_or_google_pay\"\n```\n:::\n:::\n\n\n## `MLJ` Interface\n\nSince our package is interfaced to [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/dev/), we need to define a wrapper model that conforms to the `MLJ` interface. In order to add the model for general use, we would probably go through [`MLJFlux.jl`](https://github.com/FluxML/MLJFlux.jl), but for this tutorial, we will make our life easy and simply overload the `MLJBase.fit` and `MLJBase.predict` methods. Since the model from HF is already pre-trained and we are not interested in further fine-tuning, we will simply return the model object in the `MLJBase.fit` method. The `MLJBase.predict` method will then take the model object and the query and return the predicted probabilities. We also need to define the `MLJBase.target_scitype` and `MLJBase.predict_mode` methods. The former tells `MLJ` what the output type of the model is, and the latter can be used to retrieve the label with the highest predicted probability.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nstruct IntentClassifier <: MLJBase.Probabilistic\n tkr::TextEncoders.AbstractTransformerTextEncoder\n mod::HuggingFace.HGFRobertaForSequenceClassification\nend\n\nfunction IntentClassifier(;\n tokenizer::TextEncoders.AbstractTransformerTextEncoder, \n model::HuggingFace.HGFRobertaForSequenceClassification,\n)\n IntentClassifier(tkr, mod)\nend\n\nfunction get_hidden_state(clf::IntentClassifier, query::Union{AbstractString, Vector{<:AbstractString}})\n token = encode(clf.tkr, query)\n hidden_state = clf.mod.model(token).hidden_state\n return hidden_state\nend\n\n# This doesn't actually retrain the model, but it retrieves the classifier object\nfunction MLJBase.fit(clf::IntentClassifier, verbosity, X, y)\n cache=nothing\n report=nothing\n fitresult = (clf = clf.mod.cls, labels = levels(y))\n return fitresult, cache, report\nend\n\nfunction MLJBase.predict(clf::IntentClassifier, fitresult, Xnew)\n output = fitresult.clf(get_hidden_state(clf, Xnew))\n p̂ = UnivariateFinite(fitresult.labels,softmax(output.logit)',pool=missing)\n return p̂\nend\n\nMLJBase.target_scitype(clf::IntentClassifier) = AbstractVector{<:Finite}\n\nMLJBase.predict_mode(clf::IntentClassifier, fitresult, Xnew) = mode.(MLJBase.predict(clf, fitresult, Xnew))\n```\n:::\n\n\nTo test that everything is working as expected, we fit the model and generated predictions for a subset of the test data:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nclf = IntentClassifier(tkr, mod)\ntop_n = 10\nfitresult, _, _ = MLJBase.fit(clf, 1, nothing, y_test[1:top_n])\n@time ŷ = MLJBase.predict(clf, fitresult, queries_test[1:top_n]);\n```\n:::\n\n\n## Conformal Chatbot\n\nTo turn the wrapped, pre-trained model into a conformal intent classifier, we can now rely on standard API calls. We first wrap our atomic model where we also specify the desired coverage rate and method. Since even simple forward passes are computationally expensive for our (small) LLM, we rely on Simple Inductive Conformal Classification.\n\n```{.julia}\n#| eval: false\n\nconf_model = conformal_model(clf; coverage=0.95, method=:simple_inductive, train_ratio=train_ratio)\nmach = machine(conf_model, queries, y)\n@time fit!(mach)\nSerialization.serialize(\"dev/artifacts/models/banking77/simple_inductive.jls\", mach)\n```\n\nFinally, we use our conformal LLM to build a simple and yet powerful chatbot that runs directly in the Julia REPL. Without dwelling on the details too much, the `conformal_chatbot` works as follows:\n\n1. Prompt user to explain their intent.\n2. Feed user input through conformal LLM and present the output to the user.\n3. If the conformal prediction sets includes more than one label, prompt the user to either refine their input or choose one of the options included in the set.\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nmach = Serialization.deserialize(\"dev/artifacts/models/banking77/simple_inductive.jls\")\n\nfunction prediction_set(mach, query::String)\n p̂ = MLJBase.predict(mach, query)[1]\n probs = pdf.(p̂, collect(1:77))\n in_set = findall(probs .!= 0)\n labels_in_set = labels[in_set]\n probs_in_set = probs[in_set]\n _order = sortperm(-probs_in_set)\n plt = UnicodePlots.barplot(labels_in_set[_order], probs_in_set[_order], title=\"Possible Intents\")\n return labels_in_set, plt\nend\n\nfunction conformal_chatbot()\n println(\"👋 Hi, I'm a Julia, your conformal chatbot. I'm here to help you with your banking query. Ask me anything or type 'exit' to exit ...\\n\")\n completed = false\n queries = \"\"\n while !completed\n query = readline()\n queries = queries * \",\" * query\n labels, plt = prediction_set(mach, queries)\n if length(labels) > 1\n println(\"🤔 Hmmm ... I can think of several options here. If any of these applies, simply type the corresponding number (e.g. '1' for the first option). Otherwise, can you refine your question, please?\\n\")\n println(plt)\n else\n println(\"🥳 I think you mean $(labels[1]). Correct?\")\n end\n\n # Exit:\n if query == \"exit\"\n println(\"👋 Bye!\")\n break\n end\n if query ∈ string.(collect(1:77))\n println(\"👍 Great! You've chosen '$(labels[parse(Int64, query)])'. I'm glad I could help you. Have a nice day!\")\n completed = true\n end\n end\nend\n```\n:::\n\n\nBelow we show the output for two example queries. The first one is very ambiguous. As expected, the size of the prediction set is therefore large. \n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nambiguous_query = \"transfer mondey?\"\nprediction_set(mach, ambiguous_query)[2]\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\n Possible Intents \n ┌ ┐ \n beneficiary_not_allowed ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.150517 \n balance_not_updated_after_bank_transfer ┤■■■■■■■■■■■■■■■■■■■■■■ 0.111409 \n transfer_into_account ┤■■■■■■■■■■■■■■■■■■■ 0.0939535 \n transfer_not_received_by_recipient ┤■■■■■■■■■■■■■■■■■■ 0.091163 \n top_up_by_bank_transfer_charge ┤■■■■■■■■■■■■■■■■■■ 0.089306 \n failed_transfer ┤■■■■■■■■■■■■■■■■■■ 0.0888322 \n transfer_timing ┤■■■■■■■■■■■■■ 0.0641952 \n transfer_fee_charged ┤■■■■■■■ 0.0361131 \n pending_transfer ┤■■■■■ 0.0270795 \n receiving_money ┤■■■■■ 0.0252126 \n declined_transfer ┤■■■ 0.0164443 \n cancel_transfer ┤■■■ 0.0150444 \n └ ┘ \n```\n:::\n:::\n\n\nThe more refined version of the prompt yields a smaller prediction set: less ambiguous prompts result in lower predictive uncertainty. \n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\nrefined_query = \"I tried to transfer money to my friend, but it failed.\"\nprediction_set(mach, refined_query)[2]\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n Possible Intents \n ┌ ┐ \n failed_transfer ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.59042 \n beneficiary_not_allowed ┤■■■■■■■ 0.139806 \n transfer_not_received_by_recipient ┤■■ 0.0449783 \n balance_not_updated_after_bank_transfer ┤■■ 0.037894 \n declined_transfer ┤■ 0.0232856 \n transfer_into_account ┤■ 0.0108771 \n cancel_transfer ┤ 0.00876369 \n └ ┘ \n```\n:::\n:::\n\n\nBelow we include a short demo video that shows the REPL-based chatbot in action.\n\n![](/docs/src/www/demo_llm.gif)\n\n## Final Remarks\n\nThis work was done in collaboration with colleagues at ING as part of the ING Analytics 2023 Experiment Week. Our team demonstrated that Conformal Prediction provides a powerful and principled alternative to top-*K* intent classification. We won the first prize by popular vote.\n\n## References\n\n", + "markdown": "---\ntitle: How to Build a Conformal Chatbot\n---\n\n\n\n\n\nLarge Language Models are all the buzz right now. They are used for a variety of tasks, including text classification, question answering, and text generation. In this tutorial, we will show how to conformalize a transformer language model for text classification. We will use the [Banking77](https://arxiv.org/abs/2003.04807) dataset [@casanueva2020efficient], which consists of 13,083 queries from 77 intents. On the model side, we will use the [DistilRoBERTa](https://huggingface.co/mrm8488/distilroberta-finetuned-banking77) model, which is a distilled version of [RoBERTa](https://arxiv.org/abs/1907.11692) [@liu2019roberta] finetuned on the Banking77 dataset.\n\n## Data\n\nThe data was downloaded from [HuggingFace](https://huggingface.co/datasets/PolyAI/banking77) 🤗 (HF) and split into a proper training, calibration, and test set. All that's left to do is to load the data and preprocess it. We add 1 to the labels to make them 1-indexed (sorry Pythonistas 😜)\n\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# Get labels:\ndf_labels = CSV.read(\"dev/artifacts/data/banking77/labels.csv\", DataFrame, drop=[1])\nlabels = df_labels[:,1]\n\n# Get data:\ndf_train = CSV.read(\"dev/artifacts/data/banking77/train.csv\", DataFrame, drop=[1])\ndf_cal = CSV.read(\"dev/artifacts/data/banking77/calibration.csv\", DataFrame, drop=[1])\ndf_full_train = vcat(df_train, df_cal)\ntrain_ratio = round(nrow(df_train)/nrow(df_full_train), digits=2)\ndf_test = CSV.read(\"dev/artifacts/data/banking77/test.csv\", DataFrame, drop=[1])\n\n# Preprocess data:\nqueries_train, y_train = collect(df_train.text), categorical(df_train.labels .+ 1)\nqueries_cal, y_cal = collect(df_cal.text), categorical(df_cal.labels .+ 1)\nqueries, y = collect(df_full_train.text), categorical(df_full_train.labels .+ 1)\nqueries_test, y_test = collect(df_test.text), categorical(df_test.labels .+ 1)\n```\n:::\n\n\n## HuggingFace Model\n\nThe model can be loaded from HF straight into our running Julia session using the [`Transformers.jl`](https://github.com/chengchingwen/Transformers.jl/tree/master) package. Below we load the tokenizer `tkr` and the model `mod`. The tokenizer is used to convert the text into a sequence of integers, which is then fed into the model. The model outputs a hidden state, which is then fed into a classifier to get the logits for each class. Finally, the logits are then passed through a softmax function to get the corresponding predicted probabilities. Below we run a few queries through the model to see how it performs.\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Load model from HF 🤗:\ntkr = hgf\"mrm8488/distilroberta-finetuned-banking77:tokenizer\"\nmod = hgf\"mrm8488/distilroberta-finetuned-banking77:ForSequenceClassification\"\n\n# Test model:\nquery = [\n \"What is the base of the exchange rates?\",\n \"Why is my card not working?\",\n \"My Apple Pay is not working, what should I do?\",\n]\na = encode(tkr, query)\nb = mod.model(a)\nc = mod.cls(b.hidden_state)\nd = softmax(c.logit)\n[labels[i] for i in Flux.onecold(d)]\n```\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\n3-element Vector{String}:\n \"exchange_rate\"\n \"card_not_working\"\n \"apple_pay_or_google_pay\"\n```\n:::\n:::\n\n\n## `MLJ` Interface\n\nSince our package is interfaced to [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/dev/), we need to define a wrapper model that conforms to the `MLJ` interface. In order to add the model for general use, we would probably go through [`MLJFlux.jl`](https://github.com/FluxML/MLJFlux.jl), but for this tutorial, we will make our life easy and simply overload the `MLJBase.fit` and `MLJBase.predict` methods. Since the model from HF is already pre-trained and we are not interested in further fine-tuning, we will simply return the model object in the `MLJBase.fit` method. The `MLJBase.predict` method will then take the model object and the query and return the predicted probabilities. We also need to define the `MLJBase.target_scitype` and `MLJBase.predict_mode` methods. The former tells `MLJ` what the output type of the model is, and the latter can be used to retrieve the label with the highest predicted probability.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nstruct IntentClassifier <: MLJBase.Probabilistic\n tkr::TextEncoders.AbstractTransformerTextEncoder\n mod::HuggingFace.HGFRobertaForSequenceClassification\nend\n\nfunction IntentClassifier(;\n tokenizer::TextEncoders.AbstractTransformerTextEncoder, \n model::HuggingFace.HGFRobertaForSequenceClassification,\n)\n IntentClassifier(tkr, mod)\nend\n\nfunction get_hidden_state(clf::IntentClassifier, query::Union{AbstractString, Vector{<:AbstractString}})\n token = encode(clf.tkr, query)\n hidden_state = clf.mod.model(token).hidden_state\n return hidden_state\nend\n\n# This doesn't actually retrain the model, but it retrieves the classifier object\nfunction MLJBase.fit(clf::IntentClassifier, verbosity, X, y)\n cache=nothing\n report=nothing\n fitresult = (clf = clf.mod.cls, labels = levels(y))\n return fitresult, cache, report\nend\n\nfunction MLJBase.predict(clf::IntentClassifier, fitresult, Xnew)\n output = fitresult.clf(get_hidden_state(clf, Xnew))\n p̂ = UnivariateFinite(fitresult.labels,softmax(output.logit)',pool=missing)\n return p̂\nend\n\nMLJBase.target_scitype(clf::IntentClassifier) = AbstractVector{<:Finite}\n\nMLJBase.predict_mode(clf::IntentClassifier, fitresult, Xnew) = mode.(MLJBase.predict(clf, fitresult, Xnew))\n```\n:::\n\n\nTo test that everything is working as expected, we fit the model and generated predictions for a subset of the test data:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nclf = IntentClassifier(tkr, mod)\ntop_n = 10\nfitresult, _, _ = MLJBase.fit(clf, 1, nothing, y_test[1:top_n])\n@time ŷ = MLJBase.predict(clf, fitresult, queries_test[1:top_n]);\n```\n:::\n\n\n## Conformal Chatbot\n\nTo turn the wrapped, pre-trained model into a conformal intent classifier, we can now rely on standard API calls. We first wrap our atomic model where we also specify the desired coverage rate and method. Since even simple forward passes are computationally expensive for our (small) LLM, we rely on Simple Inductive Conformal Classification.\n\n```{.julia}\n#| eval: false\n\nconf_model = conformal_model(clf; coverage=0.95, method=:simple_inductive, train_ratio=train_ratio)\nmach = machine(conf_model, queries, y)\n@time fit!(mach)\nSerialization.serialize(\"dev/artifacts/models/banking77/simple_inductive.jls\", mach)\n```\n\nFinally, we use our conformal LLM to build a simple and yet powerful chatbot that runs directly in the Julia REPL. Without dwelling on the details too much, the `conformal_chatbot` works as follows:\n\n1. Prompt user to explain their intent.\n2. Feed user input through conformal LLM and present the output to the user.\n3. If the conformal prediction sets includes more than one label, prompt the user to either refine their input or choose one of the options included in the set.\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nmach = Serialization.deserialize(\"dev/artifacts/models/banking77/simple_inductive.jls\")\n\nfunction prediction_set(mach, query::String)\n p̂ = MLJBase.predict(mach, query)[1]\n probs = pdf.(p̂, collect(1:77))\n in_set = findall(probs .!= 0)\n labels_in_set = labels[in_set]\n probs_in_set = probs[in_set]\n _order = sortperm(-probs_in_set)\n plt = UnicodePlots.barplot(labels_in_set[_order], probs_in_set[_order], title=\"Possible Intents\")\n return labels_in_set, plt\nend\n\nfunction conformal_chatbot()\n println(\"👋 Hi, I'm a Julia, your conformal chatbot. I'm here to help you with your banking query. Ask me anything or type 'exit' to exit ...\\n\")\n completed = false\n queries = \"\"\n while !completed\n query = readline()\n queries = queries * \",\" * query\n labels, plt = prediction_set(mach, queries)\n if length(labels) > 1\n println(\"🤔 Hmmm ... I can think of several options here. If any of these applies, simply type the corresponding number (e.g. '1' for the first option). Otherwise, can you refine your question, please?\\n\")\n println(plt)\n else\n println(\"🥳 I think you mean $(labels[1]). Correct?\")\n end\n\n # Exit:\n if query == \"exit\"\n println(\"👋 Bye!\")\n break\n end\n if query ∈ string.(collect(1:77))\n println(\"👍 Great! You've chosen '$(labels[parse(Int64, query)])'. I'm glad I could help you. Have a nice day!\")\n completed = true\n end\n end\nend\n```\n:::\n\n\nBelow we show the output for two example queries. The first one is very ambiguous. As expected, the size of the prediction set is therefore large. \n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nambiguous_query = \"transfer mondey?\"\nprediction_set(mach, ambiguous_query)[2]\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\n Possible Intents \n ┌ ┐ \n beneficiary_not_allowed ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.150517 \n balance_not_updated_after_bank_transfer ┤■■■■■■■■■■■■■■■■■■■■■■ 0.111409 \n transfer_into_account ┤■■■■■■■■■■■■■■■■■■■ 0.0939535 \n transfer_not_received_by_recipient ┤■■■■■■■■■■■■■■■■■■ 0.091163 \n top_up_by_bank_transfer_charge ┤■■■■■■■■■■■■■■■■■■ 0.089306 \n failed_transfer ┤■■■■■■■■■■■■■■■■■■ 0.0888322 \n transfer_timing ┤■■■■■■■■■■■■■ 0.0641952 \n transfer_fee_charged ┤■■■■■■■ 0.0361131 \n pending_transfer ┤■■■■■ 0.0270795 \n receiving_money ┤■■■■■ 0.0252126 \n declined_transfer ┤■■■ 0.0164443 \n cancel_transfer ┤■■■ 0.0150444 \n └ ┘ \n```\n:::\n:::\n\n\nThe more refined version of the prompt yields a smaller prediction set: less ambiguous prompts result in lower predictive uncertainty. \n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\nrefined_query = \"I tried to transfer money to my friend, but it failed.\"\nprediction_set(mach, refined_query)[2]\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n Possible Intents \n ┌ ┐ \n failed_transfer ┤■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.59042 \n beneficiary_not_allowed ┤■■■■■■■ 0.139806 \n transfer_not_received_by_recipient ┤■■ 0.0449783 \n balance_not_updated_after_bank_transfer ┤■■ 0.037894 \n declined_transfer ┤■ 0.0232856 \n transfer_into_account ┤■ 0.0108771 \n cancel_transfer ┤ 0.00876369 \n └ ┘ \n```\n:::\n:::\n\n\nBelow we include a short demo video that shows the REPL-based chatbot in action.\n\n![](../www/demo_llm.gif)\n\n## Final Remarks\n\nThis work was done in collaboration with colleagues at ING as part of the ING Analytics 2023 Experiment Week. Our team demonstrated that Conformal Prediction provides a powerful and principled alternative to top-*K* intent classification. We won the first prize by popular vote.\n\n## References\n\n", "supporting": [ "llm_files" ], diff --git a/_freeze/docs/src/tutorials/classification/execute-results/md.json b/_freeze/docs/src/tutorials/classification/execute-results/md.json index 0c10473..28a123e 100644 --- a/_freeze/docs/src/tutorials/classification/execute-results/md.json +++ b/_freeze/docs/src/tutorials/classification/execute-results/md.json @@ -1,7 +1,7 @@ { - "hash": "ebae046d39c2311af6bca8d99160621c", + "hash": "17adfe36e063080b84720452af265259", "result": { - "markdown": "---\ntitle: Classification\n---\n\n\n\n\n\n\nThis tutorial is based in parts on this [blog post](https://www.paltmeyer.com/blog/posts/conformal-prediction/).\n\n## Split Conformal Classification {#sec-scp}\n\nWe consider a simple binary classification problem. Let $(X_i, Y_i), \\ i=1,...,n$ denote our feature-label pairs and let $\\mu: \\mathcal{X} \\mapsto \\mathcal{Y}$ denote the mapping from features to labels. For illustration purposes we will use the moons dataset 🌙. Using [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/v0.18/) we first generate the data and split into into a training and test set:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nusing MLJ\nusing Random\nRandom.seed!(123)\n\n# Data:\nX, y = make_moons(500; noise=0.15)\nX = MLJ.table(convert.(Float32, MLJ.matrix(X)))\ntrain, test = partition(eachindex(y), 0.8, shuffle=true)\n```\n:::\n\n\nHere we will use a specific case of CP called *split conformal prediction* which can then be summarized as follows:^[In other places split conformal prediction is sometimes referred to as *inductive* conformal prediction.]\n\n1. Partition the training into a proper training set and a separate calibration set: $\\mathcal{D}_n=\\mathcal{D}^{\\text{train}} \\cup \\mathcal{D}^{\\text{cali}}$.\n2. Train the machine learning model on the proper training set: $\\hat\\mu_{i \\in \\mathcal{D}^{\\text{train}}}(X_i,Y_i)$.\n3. Compute nonconformity scores, $\\mathcal{S}$, using the calibration data $\\mathcal{D}^{\\text{cali}}$ and the fitted model $\\hat\\mu_{i \\in \\mathcal{D}^{\\text{train}}}$. \n4. For a user-specified desired coverage ratio $(1-\\alpha)$ compute the corresponding quantile, $\\hat{q}$, of the empirical distribution of nonconformity scores, $\\mathcal{S}$.\n5. For the given quantile and test sample $X_{\\text{test}}$, form the corresponding conformal prediction set: \n\n$$\nC(X_{\\text{test}})=\\{y:s(X_{\\text{test}},y) \\le \\hat{q}\\}\n$$ {#eq-set}\n\nThis is the default procedure used for classification and regression in [`ConformalPrediction.jl`](https://github.com/juliatrustworthyai/ConformalPrediction.jl). \n\nNow let's take this to our 🌙 data. To illustrate the package functionality we will demonstrate the envisioned workflow. We first define our atomic machine learning model following standard [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/v0.18/) conventions. Using [`ConformalPrediction.jl`](https://github.com/juliatrustworthyai/ConformalPrediction.jl) we then wrap our atomic model in a conformal model using the standard API call `conformal_model(model::Supervised; kwargs...)`. To train and predict from our conformal model we can then rely on the conventional [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/v0.18/) procedure again. In particular, we wrap our conformal model in data (turning it into a machine) and then fit it to the training data. Finally, we use our machine to predict the label for a new test sample `Xtest`:\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Model:\nKNNClassifier = @load KNNClassifier pkg=NearestNeighborModels\nmodel = KNNClassifier(;K=50) \n\n# Training:\nusing ConformalPrediction\nconf_model = conformal_model(model; coverage=.9)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\n\n# Conformal Prediction:\nXtest = selectrows(X, test)\nytest = y[test]\nŷ = predict(mach, Xtest)\nŷ[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nimport NearestNeighborModels ✔\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\nUnivariateFinite{Multiclass{2}}(0=>0.94)\n```\n:::\n:::\n\n\nThe final predictions are set-valued. While the softmax output remains unchanged for the `SimpleInductiveClassifier`, the size of the prediction set depends on the chosen coverage rate, $(1-\\alpha)$. \n\n::: {.cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=5}\nWhen specifying a coverage rate very close to one, the prediction set will typically include many (in some cases all) of the possible labels. Below, for example, both classes are included in the prediction set when setting the coverage rate equal to $(1-\\alpha)$=1.0. This is intuitive, since high coverage quite literally requires that the true label is covered by the prediction set with high probability.\n\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nconf_model = conformal_model(model; coverage=coverage, method=:simple_inductive)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\n\n# Conformal Prediction:\nXtest = (x1=[1],x2=[0])\npredict(mach, Xtest)[1]\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\nUnivariateFinite{Multiclass{2}}(0=>0.5, 1=>0.5)\n```\n:::\n:::\n\n\n::: {.cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=7}\nConversely, for low coverage rates, prediction sets can also be empty. For a choice of $(1-\\alpha)$=0.1, for example, the prediction set for our test sample is empty. This is a bit difficult to think about intuitively and I have not yet come across a satisfactory, intuitive interpretation.^[Any thoughts/comments welcome!] When the prediction set is empty, the `predict` call currently returns `missing`:\n\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nconf_model = conformal_model(model; coverage=coverage)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\n\n# Conformal Prediction:\npredict(mach, Xtest)[1]\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\nmissing\n```\n:::\n:::\n\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\ncov_ = .95\nconf_model = conformal_model(model; coverage=cov_)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\nMarkdown.parse(\"\"\"\nThe following chart shows the resulting predicted probabilities for ``y=1`` (left) and set size (right) for a choice of ``(1-\\\\alpha)``=$cov_.\n\"\"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\nThe following chart shows the resulting predicted probabilities for $y=1$ (left) and set size (right) for a choice of $(1-\\alpha)$=0.95.\n\n:::\n:::\n\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nusing Plots\np_proba = contourf(mach.model, mach.fitresult, X, y)\np_set_size = contourf(mach.model, mach.fitresult, X, y; plot_set_size=true)\nplot(p_proba, p_set_size, size=(800,250))\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n![](classification_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\n\n\nThe animation below should provide some more intuition as to what exactly is happening here. It illustrates the effect of the chosen coverage rate on the predicted softmax output and the set size in the two-dimensional feature space. Contours are overlayed with the moon data points (including test data). The two samples highlighted in red, $X_1$ and $X_2$, have been manually added for illustration purposes. Let's look at these one by one.\n\nFirstly, note that $X_1$ (red cross) falls into a region of the domain that is characterized by high predictive uncertainty. It sits right at the bottom-right corner of our class-zero moon 🌜 (orange), a region that is almost entirely enveloped by our class-one moon 🌛 (green). For low coverage rates the prediction set for $X_1$ is empty: on the left-hand side this is indicated by the missing contour for the softmax probability; on the right-hand side we can observe that the corresponding set size is indeed zero. For high coverage rates the prediction set includes both $y=0$ and $y=1$, indicative of the fact that the conformal classifier is uncertain about the true label.\n\nWith respect to $X_2$, we observe that while also sitting on the fringe of our class-zero moon, this sample populates a region that is not fully enveloped by data points from the opposite class. In this region, the underlying atomic classifier can be expected to be more certain about its predictions, but still not highly confident. How is this reflected by our corresponding conformal prediction sets? \n\n::: {.cell execution_count=11}\n``` {.julia .cell-code code-fold=\"true\"}\nXtest_2 = (x1=[-0.5],x2=[0.25])\np̂_2 = pdf(predict(mach, Xtest_2)[1], 0)\n```\n:::\n\n\n::: {.cell execution_count=12}\n\n::: {.cell-output .cell-output-display execution_count=13}\nWell, for low coverage rates (roughly $<0.9$) the conformal prediction set does not include $y=0$: the set size is zero (right panel). Only for higher coverage rates do we have $C(X_2)=\\{0\\}$: the coverage rate is high enough to include $y=0$, but the corresponding softmax probability is still fairly low. For example, for $(1-\\alpha)=0.95$ we have $\\hat{p}(y=0|X_2)=0.72.$\n\n:::\n:::\n\n\nThese two examples illustrate an interesting point: for regions characterized by high predictive uncertainty, conformal prediction sets are typically empty (for low coverage) or large (for high coverage). While set-valued predictions may be something to get used to, this notion is overall intuitive. \n\n::: {.cell execution_count=13}\n``` {.julia .cell-code}\n# Setup\ncoverages = range(0.75,1.0,length=5)\nn = 100\nx1_range = range(extrema(X.x1)...,length=n)\nx2_range = range(extrema(X.x2)...,length=n)\n\nanim = @animate for coverage in coverages\n conf_model = conformal_model(model; coverage=coverage)\n mach = machine(conf_model, X, y)\n fit!(mach, rows=train)\n # Probabilities:\n p1 = contourf(mach.model, mach.fitresult, X, y)\n scatter!(p1, Xtest.x1, Xtest.x2, ms=6, c=:red, label=\"X₁\", shape=:cross, msw=6)\n scatter!(p1, Xtest_2.x1, Xtest_2.x2, ms=6, c=:red, label=\"X₂\", shape=:diamond, msw=6)\n p2 = contourf(mach.model, mach.fitresult, X, y; plot_set_size=true)\n scatter!(p2, Xtest.x1, Xtest.x2, ms=6, c=:red, label=\"X₁\", shape=:cross, msw=6)\n scatter!(p2, Xtest_2.x1, Xtest_2.x2, ms=6, c=:red, label=\"X₂\", shape=:diamond, msw=6)\n plot(p1, p2, plot_title=\"(1-α)=$(round(coverage,digits=2))\", size=(800,300))\nend\n\ngif(anim, joinpath(www_path,\"classification.gif\"), fps=1)\n```\n\n::: {#fig-anim .cell-output .cell-output-display execution_count=14}\n```{=html}\n\n```\n\nThe effect of the coverage rate on the conformal prediction set. Softmax probabilities are shown on the left. The size of the prediction set is shown on the right.\n:::\n:::\n\n\n![](../www/classification.gif)\n\n## Adaptive Sets\n\nInstead of using the simple approach, we can use adaptive prediction sets [@angelopoulos2021gentle]:\n\n::: {.cell execution_count=14}\n``` {.julia .cell-code}\nconf_model = conformal_model(model; coverage=cov_, method=:adaptive_inductive)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\nresults[:adaptive_inductive] = mach\n```\n:::\n\n\n::: {.cell execution_count=15}\n``` {.julia .cell-code}\nusing Plots\np_proba = contourf(mach.model, mach.fitresult, X, y)\np_set_size = contourf(mach.model, mach.fitresult, X, y; plot_set_size=true)\nplot(p_proba, p_set_size, size=(800,250))\n```\n:::\n\n\n## Evaluation\n\nFor evaluation of conformal predictors we follow @angelopoulos2021gentle (Section 3). As a first step towards adaptiveness (adaptivity), the authors recommend to inspect the set size of conformal predictions. The chart below shows the interval width for the different methods along with the ground truth interval width:\n\n::: {.cell execution_count=16}\n``` {.julia .cell-code}\nplt_list = []\nfor (_mod, mach) in results\n push!(plt_list, bar(mach.model, mach.fitresult, X; title=String(_mod)))\nend\nplot(plt_list..., size=(800,300))\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n![Prediction interval width.](classification_files/figure-commonmark/fig-setsize-output-1.svg){#fig-setsize}\n:::\n:::\n\n\nWe can also use specific metrics like **empirical coverage** and **size-stratified coverage** to check for correctness and adaptiveness, respectively. To this end, the package provides custom measures that are compatible with `MLJ.jl`. In other words, we can evaluate model performance in true `MLJ.jl` fashion (see [here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/)). \n\nThe code below runs the evaluation with respect to both metrics, `emp_coverage` and `ssc` for a single conformal machine: \n\n::: {.cell execution_count=17}\n``` {.julia .cell-code}\n_mod, mach = first(results)\n_eval = evaluate!(\n mach,\n operation=predict,\n measure=[emp_coverage, ssc]\n)\n# display(_eval)\nprintln(\"Empirical coverage for $(_mod): $(round(_eval.measurement[1], digits=3))\")\nprintln(\"SSC for $(_mod): $(round(_eval.measurement[2], digits=3))\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEmpirical coverage for adaptive_inductive: 0.962\nSSC for adaptive_inductive: 0.962\n```\n:::\n:::\n\n\n## References\n\n", + "markdown": "---\ntitle: Classification\n---\n\n\n\n\n\n\nThis tutorial is based in parts on this [blog post](https://www.paltmeyer.com/blog/posts/conformal-prediction/).\n\n## Split Conformal Classification {#sec-scp}\n\nWe consider a simple binary classification problem. Let $(X_i, Y_i), \\ i=1,...,n$ denote our feature-label pairs and let $\\mu: \\mathcal{X} \\mapsto \\mathcal{Y}$ denote the mapping from features to labels. For illustration purposes we will use the moons dataset 🌙. Using [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/v0.18/) we first generate the data and split into into a training and test set:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\nusing MLJ\nusing Random\nRandom.seed!(123)\n\n# Data:\nX, y = make_moons(500; noise=0.15)\nX = MLJ.table(convert.(Float32, MLJ.matrix(X)))\ntrain, test = partition(eachindex(y), 0.8, shuffle=true)\n```\n:::\n\n\nHere we will use a specific case of CP called *split conformal prediction* which can then be summarized as follows:^[In other places split conformal prediction is sometimes referred to as *inductive* conformal prediction.]\n\n1. Partition the training into a proper training set and a separate calibration set: $\\mathcal{D}_n=\\mathcal{D}^{\\text{train}} \\cup \\mathcal{D}^{\\text{cali}}$.\n2. Train the machine learning model on the proper training set: $\\hat\\mu_{i \\in \\mathcal{D}^{\\text{train}}}(X_i,Y_i)$.\n3. Compute nonconformity scores, $\\mathcal{S}$, using the calibration data $\\mathcal{D}^{\\text{cali}}$ and the fitted model $\\hat\\mu_{i \\in \\mathcal{D}^{\\text{train}}}$. \n4. For a user-specified desired coverage ratio $(1-\\alpha)$ compute the corresponding quantile, $\\hat{q}$, of the empirical distribution of nonconformity scores, $\\mathcal{S}$.\n5. For the given quantile and test sample $X_{\\text{test}}$, form the corresponding conformal prediction set: \n\n$$\nC(X_{\\text{test}})=\\{y:s(X_{\\text{test}},y) \\le \\hat{q}\\}\n$$ {#eq-set}\n\nThis is the default procedure used for classification and regression in [`ConformalPrediction.jl`](https://github.com/juliatrustworthyai/ConformalPrediction.jl). \n\nNow let's take this to our 🌙 data. To illustrate the package functionality we will demonstrate the envisioned workflow. We first define our atomic machine learning model following standard [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/v0.18/) conventions. Using [`ConformalPrediction.jl`](https://github.com/juliatrustworthyai/ConformalPrediction.jl) we then wrap our atomic model in a conformal model using the standard API call `conformal_model(model::Supervised; kwargs...)`. To train and predict from our conformal model we can then rely on the conventional [`MLJ.jl`](https://alan-turing-institute.github.io/MLJ.jl/v0.18/) procedure again. In particular, we wrap our conformal model in data (turning it into a machine) and then fit it to the training data. Finally, we use our machine to predict the label for a new test sample `Xtest`:\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Model:\nKNNClassifier = @load KNNClassifier pkg=NearestNeighborModels\nmodel = KNNClassifier(;K=50) \n\n# Training:\nusing ConformalPrediction\nconf_model = conformal_model(model; coverage=.9)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\n\n# Conformal Prediction:\nXtest = selectrows(X, test)\nytest = y[test]\nŷ = predict(mach, Xtest)\nŷ[1]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nimport NearestNeighborModels ✔\n```\n:::\n\n::: {.cell-output .cell-output-display execution_count=4}\n```\nUnivariateFinite{Multiclass{2}}(0=>0.94)\n```\n:::\n:::\n\n\nThe final predictions are set-valued. While the softmax output remains unchanged for the `SimpleInductiveClassifier`, the size of the prediction set depends on the chosen coverage rate, $(1-\\alpha)$. \n\n::: {.cell execution_count=4}\n\n::: {.cell-output .cell-output-display execution_count=5}\nWhen specifying a coverage rate very close to one, the prediction set will typically include many (in some cases all) of the possible labels. Below, for example, both classes are included in the prediction set when setting the coverage rate equal to $(1-\\alpha)$=1.0. This is intuitive, since high coverage quite literally requires that the true label is covered by the prediction set with high probability.\n\n:::\n:::\n\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nconf_model = conformal_model(model; coverage=coverage, method=:simple_inductive)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\n\n# Conformal Prediction:\nXtest = (x1=[1],x2=[0])\npredict(mach, Xtest)[1]\n```\n\n::: {.cell-output .cell-output-display execution_count=6}\n```\nUnivariateFinite{Multiclass{2}}(0=>0.5, 1=>0.5)\n```\n:::\n:::\n\n\n::: {.cell execution_count=6}\n\n::: {.cell-output .cell-output-display execution_count=7}\nConversely, for low coverage rates, prediction sets can also be empty. For a choice of $(1-\\alpha)$=0.1, for example, the prediction set for our test sample is empty. This is a bit difficult to think about intuitively and I have not yet come across a satisfactory, intuitive interpretation.^[Any thoughts/comments welcome!] When the prediction set is empty, the `predict` call currently returns `missing`:\n\n:::\n:::\n\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nconf_model = conformal_model(model; coverage=coverage)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\n\n# Conformal Prediction:\npredict(mach, Xtest)[1]\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n```\nmissing\n```\n:::\n:::\n\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\ncov_ = .95\nconf_model = conformal_model(model; coverage=cov_)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\nMarkdown.parse(\"\"\"\nThe following chart shows the resulting predicted probabilities for ``y=1`` (left) and set size (right) for a choice of ``(1-\\\\alpha)``=$cov_.\n\"\"\")\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\nThe following chart shows the resulting predicted probabilities for $y=1$ (left) and set size (right) for a choice of $(1-\\alpha)$=0.95.\n\n:::\n:::\n\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nusing Plots\np_proba = contourf(mach.model, mach.fitresult, X, y)\np_set_size = contourf(mach.model, mach.fitresult, X, y; plot_set_size=true)\nplot(p_proba, p_set_size, size=(800,250))\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n![](classification_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\n\n\nThe animation below should provide some more intuition as to what exactly is happening here. It illustrates the effect of the chosen coverage rate on the predicted softmax output and the set size in the two-dimensional feature space. Contours are overlayed with the moon data points (including test data). The two samples highlighted in red, $X_1$ and $X_2$, have been manually added for illustration purposes. Let's look at these one by one.\n\nFirstly, note that $X_1$ (red cross) falls into a region of the domain that is characterized by high predictive uncertainty. It sits right at the bottom-right corner of our class-zero moon 🌜 (orange), a region that is almost entirely enveloped by our class-one moon 🌛 (green). For low coverage rates the prediction set for $X_1$ is empty: on the left-hand side this is indicated by the missing contour for the softmax probability; on the right-hand side we can observe that the corresponding set size is indeed zero. For high coverage rates the prediction set includes both $y=0$ and $y=1$, indicative of the fact that the conformal classifier is uncertain about the true label.\n\nWith respect to $X_2$, we observe that while also sitting on the fringe of our class-zero moon, this sample populates a region that is not fully enveloped by data points from the opposite class. In this region, the underlying atomic classifier can be expected to be more certain about its predictions, but still not highly confident. How is this reflected by our corresponding conformal prediction sets? \n\n::: {.cell execution_count=11}\n``` {.julia .cell-code code-fold=\"true\"}\nXtest_2 = (x1=[-0.5],x2=[0.25])\np̂_2 = pdf(predict(mach, Xtest_2)[1], 0)\n```\n:::\n\n\n::: {.cell execution_count=12}\n\n::: {.cell-output .cell-output-display execution_count=13}\nWell, for low coverage rates (roughly $<0.9$) the conformal prediction set does not include $y=0$: the set size is zero (right panel). Only for higher coverage rates do we have $C(X_2)=\\{0\\}$: the coverage rate is high enough to include $y=0$, but the corresponding softmax probability is still fairly low. For example, for $(1-\\alpha)=0.95$ we have $\\hat{p}(y=0|X_2)=0.72.$\n\n:::\n:::\n\n\nThese two examples illustrate an interesting point: for regions characterized by high predictive uncertainty, conformal prediction sets are typically empty (for low coverage) or large (for high coverage). While set-valued predictions may be something to get used to, this notion is overall intuitive. \n\n::: {#fig-anim .cell execution_count=13}\n``` {.julia .cell-code}\n# Setup\ncoverages = range(0.75,1.0,length=5)\nn = 100\nx1_range = range(extrema(X.x1)...,length=n)\nx2_range = range(extrema(X.x2)...,length=n)\n\nanim = @animate for coverage in coverages\n conf_model = conformal_model(model; coverage=coverage)\n mach = machine(conf_model, X, y)\n fit!(mach, rows=train)\n # Probabilities:\n p1 = contourf(mach.model, mach.fitresult, X, y)\n scatter!(p1, Xtest.x1, Xtest.x2, ms=6, c=:red, label=\"X₁\", shape=:cross, msw=6)\n scatter!(p1, Xtest_2.x1, Xtest_2.x2, ms=6, c=:red, label=\"X₂\", shape=:diamond, msw=6)\n p2 = contourf(mach.model, mach.fitresult, X, y; plot_set_size=true)\n scatter!(p2, Xtest.x1, Xtest.x2, ms=6, c=:red, label=\"X₁\", shape=:cross, msw=6)\n scatter!(p2, Xtest_2.x1, Xtest_2.x2, ms=6, c=:red, label=\"X₂\", shape=:diamond, msw=6)\n plot(p1, p2, plot_title=\"(1-α)=$(round(coverage,digits=2))\", size=(800,300))\nend\n\ngif(anim, joinpath(www_path,\"classification.gif\"), fps=1)\n```\n:::\n\n\n![](../www/classification.gif)\n\n## Adaptive Sets\n\nInstead of using the simple approach, we can use adaptive prediction sets [@angelopoulos2021gentle]:\n\n::: {.cell execution_count=14}\n``` {.julia .cell-code}\nconf_model = conformal_model(model; coverage=cov_, method=:adaptive_inductive)\nmach = machine(conf_model, X, y)\nfit!(mach, rows=train)\nresults[:adaptive_inductive] = mach\n```\n:::\n\n\n::: {.cell execution_count=15}\n``` {.julia .cell-code}\nusing Plots\np_proba = contourf(mach.model, mach.fitresult, X, y)\np_set_size = contourf(mach.model, mach.fitresult, X, y; plot_set_size=true)\nplot(p_proba, p_set_size, size=(800,250))\n```\n:::\n\n\n## Evaluation\n\nFor evaluation of conformal predictors we follow @angelopoulos2021gentle (Section 3). As a first step towards adaptiveness (adaptivity), the authors recommend to inspect the set size of conformal predictions. The chart below shows the interval width for the different methods along with the ground truth interval width:\n\n::: {.cell execution_count=16}\n``` {.julia .cell-code}\nplt_list = []\nfor (_mod, mach) in results\n push!(plt_list, bar(mach.model, mach.fitresult, X; title=String(_mod)))\nend\nplot(plt_list..., size=(800,300))\n```\n\n::: {.cell-output .cell-output-display execution_count=17}\n![Prediction interval width.](classification_files/figure-commonmark/fig-setsize-output-1.svg){#fig-setsize}\n:::\n:::\n\n\nWe can also use specific metrics like **empirical coverage** and **size-stratified coverage** to check for correctness and adaptiveness, respectively. To this end, the package provides custom measures that are compatible with `MLJ.jl`. In other words, we can evaluate model performance in true `MLJ.jl` fashion (see [here](https://alan-turing-institute.github.io/MLJ.jl/dev/evaluating_model_performance/)). \n\nThe code below runs the evaluation with respect to both metrics, `emp_coverage` and `ssc` for a single conformal machine: \n\n::: {.cell execution_count=17}\n``` {.julia .cell-code}\n_mod, mach = first(results)\n_eval = evaluate!(\n mach,\n operation=predict,\n measure=[emp_coverage, ssc]\n)\n# display(_eval)\nprintln(\"Empirical coverage for $(_mod): $(round(_eval.measurement[1], digits=3))\")\nprintln(\"SSC for $(_mod): $(round(_eval.measurement[2], digits=3))\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nEmpirical coverage for adaptive_inductive: 0.962\nSSC for adaptive_inductive: 0.962\n```\n:::\n:::\n\n\n## References\n\n", "supporting": [ "classification_files/figure-commonmark" ], diff --git a/_freeze/docs/src/tutorials/classification/figure-commonmark/cell-10-output-1.svg b/_freeze/docs/src/tutorials/classification/figure-commonmark/cell-10-output-1.svg index 03ba7a0..5d026c7 100644 --- a/_freeze/docs/src/tutorials/classification/figure-commonmark/cell-10-output-1.svg +++ b/_freeze/docs/src/tutorials/classification/figure-commonmark/cell-10-output-1.svg @@ -1,1185 +1,1185 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + diff --git a/_freeze/docs/src/tutorials/classification/figure-commonmark/fig-setsize-output-1.svg b/_freeze/docs/src/tutorials/classification/figure-commonmark/fig-setsize-output-1.svg index 1a4aca5..90f9541 100644 --- a/_freeze/docs/src/tutorials/classification/figure-commonmark/fig-setsize-output-1.svg +++ b/_freeze/docs/src/tutorials/classification/figure-commonmark/fig-setsize-output-1.svg @@ -1,69 +1,69 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + diff --git a/docs/src/how_to_guides/llm.md b/docs/src/how_to_guides/llm.md index d6f0ea9..728cdba 100644 --- a/docs/src/how_to_guides/llm.md +++ b/docs/src/how_to_guides/llm.md @@ -207,7 +207,7 @@ prediction_set(mach, refined_query)[2] Below we include a short demo video that shows the REPL-based chatbot in action. -![](../../../docs/src/www/demo_llm.gif) +![](../www/demo_llm.gif) ## Final Remarks diff --git a/docs/src/how_to_guides/llm.qmd b/docs/src/how_to_guides/llm.qmd index da21de6..ec210df 100644 --- a/docs/src/how_to_guides/llm.qmd +++ b/docs/src/how_to_guides/llm.qmd @@ -188,7 +188,7 @@ prediction_set(mach, refined_query)[2] Below we include a short demo video that shows the REPL-based chatbot in action. -![](/docs/src/www/demo_llm.gif) +![](../www/demo_llm.gif) ## Final Remarks diff --git a/docs/src/tutorials/classification.md b/docs/src/tutorials/classification.md index 2ed4daa..a2e2b3a 100644 --- a/docs/src/tutorials/classification.md +++ b/docs/src/tutorials/classification.md @@ -142,10 +142,6 @@ end gif(anim, joinpath(www_path,"classification.gif"), fps=1) ``` - - -The effect of the coverage rate on the conformal prediction set. Softmax probabilities are shown on the left. The size of the prediction set is shown on the right. - ![](../www/classification.gif) ## Adaptive Sets diff --git a/docs/src/tutorials/classification.qmd b/docs/src/tutorials/classification.qmd index 8471574..36af02b 100644 --- a/docs/src/tutorials/classification.qmd +++ b/docs/src/tutorials/classification.qmd @@ -164,7 +164,7 @@ Well, for low coverage rates (roughly ``<0.9``) the conformal prediction set doe These two examples illustrate an interesting point: for regions characterized by high predictive uncertainty, conformal prediction sets are typically empty (for low coverage) or large (for high coverage). While set-valued predictions may be something to get used to, this notion is overall intuitive. ```{julia} -#| output: true +#| output: false #| label: fig-anim #| fig-cap: "The effect of the coverage rate on the conformal prediction set. Softmax probabilities are shown on the left. The size of the prediction set is shown on the right." diff --git a/docs/src/tutorials/classification_files/figure-commonmark/cell-10-output-1.svg b/docs/src/tutorials/classification_files/figure-commonmark/cell-10-output-1.svg index 03ba7a0..5d026c7 100644 --- a/docs/src/tutorials/classification_files/figure-commonmark/cell-10-output-1.svg +++ b/docs/src/tutorials/classification_files/figure-commonmark/cell-10-output-1.svg @@ -1,1185 +1,1185 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/src/tutorials/classification_files/figure-commonmark/fig-setsize-output-1.svg b/docs/src/tutorials/classification_files/figure-commonmark/fig-setsize-output-1.svg index 1a4aca5..90f9541 100644 --- a/docs/src/tutorials/classification_files/figure-commonmark/fig-setsize-output-1.svg +++ b/docs/src/tutorials/classification_files/figure-commonmark/fig-setsize-output-1.svg @@ -1,69 +1,69 @@ - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + - + - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + +