-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add page for AstraDB self retriever (#16077)
- Loading branch information
Showing
1 changed file
with
322 additions
and
0 deletions.
There are no files selected for viewing
322 changes: 322 additions & 0 deletions
322
docs/docs/integrations/retrievers/self_query/astradb.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,322 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Astra DB\n", | ||
"\n", | ||
"DataStax [Astra DB](https://docs.datastax.com/en/astra/home/astra.html) is a serverless vector-capable database built on Cassandra and made conveniently available through an easy-to-use JSON API.\n", | ||
"\n", | ||
"In the walkthrough, we'll demo the `SelfQueryRetriever` with an `Astra DB` vector store." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Creating an Astra DB vector store\n", | ||
"First we'll want to create an Astra DB VectorStore and seed it with some data. We've created a small demo set of documents that contain summaries of movies.\n", | ||
"\n", | ||
"NOTE: The self-query retriever requires you to have `lark` installed (`pip install lark`). We also need the `astrapy` package." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%pip install --upgrade --quiet lark astrapy langchain-openai" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We want to use `OpenAIEmbeddings` so we have to get the OpenAI API Key." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"from getpass import getpass\n", | ||
"\n", | ||
"from langchain_openai.embeddings import OpenAIEmbeddings\n", | ||
"\n", | ||
"os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API Key:\")\n", | ||
"\n", | ||
"embeddings = OpenAIEmbeddings()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false, | ||
"pycharm": { | ||
"name": "#%% md\n" | ||
} | ||
}, | ||
"source": [ | ||
"Create the Astra DB VectorStore:\n", | ||
"\n", | ||
"- the API Endpoint looks like `https://01234567-89ab-cdef-0123-456789abcdef-us-east1.apps.astra.datastax.com`\n", | ||
"- the Token looks like `AstraCS:6gBhNmsk135....`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ASTRA_DB_API_ENDPOINT = input(\"ASTRA_DB_API_ENDPOINT = \")\n", | ||
"ASTRA_DB_APPLICATION_TOKEN = getpass(\"ASTRA_DB_APPLICATION_TOKEN = \")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.schema import Document\n", | ||
"from langchain.vectorstores import AstraDB\n", | ||
"\n", | ||
"docs = [\n", | ||
" Document(\n", | ||
" page_content=\"A bunch of scientists bring back dinosaurs and mayhem breaks loose\",\n", | ||
" metadata={\"year\": 1993, \"rating\": 7.7, \"genre\": \"science fiction\"},\n", | ||
" ),\n", | ||
" Document(\n", | ||
" page_content=\"Leo DiCaprio gets lost in a dream within a dream within a dream within a ...\",\n", | ||
" metadata={\"year\": 2010, \"director\": \"Christopher Nolan\", \"rating\": 8.2},\n", | ||
" ),\n", | ||
" Document(\n", | ||
" page_content=\"A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea\",\n", | ||
" metadata={\"year\": 2006, \"director\": \"Satoshi Kon\", \"rating\": 8.6},\n", | ||
" ),\n", | ||
" Document(\n", | ||
" page_content=\"A bunch of normal-sized women are supremely wholesome and some men pine after them\",\n", | ||
" metadata={\"year\": 2019, \"director\": \"Greta Gerwig\", \"rating\": 8.3},\n", | ||
" ),\n", | ||
" Document(\n", | ||
" page_content=\"Toys come alive and have a blast doing so\",\n", | ||
" metadata={\"year\": 1995, \"genre\": \"animated\"},\n", | ||
" ),\n", | ||
" Document(\n", | ||
" page_content=\"Three men walk into the Zone, three men walk out of the Zone\",\n", | ||
" metadata={\n", | ||
" \"year\": 1979,\n", | ||
" \"director\": \"Andrei Tarkovsky\",\n", | ||
" \"genre\": \"science fiction\",\n", | ||
" \"rating\": 9.9,\n", | ||
" },\n", | ||
" ),\n", | ||
"]\n", | ||
"\n", | ||
"vectorstore = AstraDB.from_documents(\n", | ||
" docs,\n", | ||
" embeddings,\n", | ||
" collection_name=\"astra_self_query_demo\",\n", | ||
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n", | ||
" token=ASTRA_DB_APPLICATION_TOKEN,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Creating our self-querying retriever\n", | ||
"Now we can instantiate our retriever. To do this we'll need to provide some information upfront about the metadata fields that our documents support and a short description of the document contents." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain.chains.query_constructor.base import AttributeInfo\n", | ||
"from langchain.llms import OpenAI\n", | ||
"from langchain.retrievers.self_query.base import SelfQueryRetriever\n", | ||
"\n", | ||
"metadata_field_info = [\n", | ||
" AttributeInfo(\n", | ||
" name=\"genre\",\n", | ||
" description=\"The genre of the movie\",\n", | ||
" type=\"string or list[string]\",\n", | ||
" ),\n", | ||
" AttributeInfo(\n", | ||
" name=\"year\",\n", | ||
" description=\"The year the movie was released\",\n", | ||
" type=\"integer\",\n", | ||
" ),\n", | ||
" AttributeInfo(\n", | ||
" name=\"director\",\n", | ||
" description=\"The name of the movie director\",\n", | ||
" type=\"string\",\n", | ||
" ),\n", | ||
" AttributeInfo(\n", | ||
" name=\"rating\", description=\"A 1-10 rating for the movie\", type=\"float\"\n", | ||
" ),\n", | ||
"]\n", | ||
"document_content_description = \"Brief summary of a movie\"\n", | ||
"llm = OpenAI(temperature=0)\n", | ||
"\n", | ||
"retriever = SelfQueryRetriever.from_llm(\n", | ||
" llm, vectorstore, document_content_description, metadata_field_info, verbose=True\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Testing it out\n", | ||
"And now we can try actually using our retriever!" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This example only specifies a relevant query\n", | ||
"retriever.get_relevant_documents(\"What are some movies about dinosaurs?\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This example specifies a filter\n", | ||
"retriever.get_relevant_documents(\"I want to watch a movie rated higher than 8.5\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This example only specifies a query and a filter\n", | ||
"retriever.get_relevant_documents(\"Has Greta Gerwig directed any movies about women\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This example specifies a composite filter\n", | ||
"retriever.get_relevant_documents(\n", | ||
" \"What's a highly rated (above 8.5), science fiction movie ?\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This example specifies a query and composite filter\n", | ||
"retriever.get_relevant_documents(\n", | ||
" \"What's a movie about toys after 1990 but before 2005, and is animated\"\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Filter k\n", | ||
"\n", | ||
"We can also use the self query retriever to specify `k`: the number of documents to fetch.\n", | ||
"\n", | ||
"We can do this by passing `enable_limit=True` to the constructor." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"retriever = SelfQueryRetriever.from_llm(\n", | ||
" llm,\n", | ||
" vectorstore,\n", | ||
" document_content_description,\n", | ||
" metadata_field_info,\n", | ||
" verbose=True,\n", | ||
" enable_limit=True,\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# This example only specifies a relevant query\n", | ||
"retriever.get_relevant_documents(\"What are two movies about dinosaurs?\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"source": [ | ||
"## Cleanup\n", | ||
"\n", | ||
"If you want to completely delete the collection from your Astra DB instance, run this.\n", | ||
"\n", | ||
"_(You will lose the data you stored in it.)_" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"collapsed": false | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"vectorstore.delete_collection()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.11.5" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |