diff --git a/docs/kg-demo.gif b/docs/kg-demo.gif new file mode 100644 index 00000000..6d77c7c9 Binary files /dev/null and b/docs/kg-demo.gif differ diff --git a/docs/vignette-kg.md b/docs/vignette-kg.md new file mode 100644 index 00000000..d2cfe8fd --- /dev/null +++ b/docs/vignette-kg.md @@ -0,0 +1,85 @@ +# Vignette: Knowledge Graph RAG + +This vignette demonstrates the KG module of BioChatter as used by the +BioChatter Next application. We connect to a BioCypher knowledge graph (KG) to +retrieve relevant information for a given question. We then use the retrieved +information to generate a response to the question. The application can connect +to any real-world BioCypher KG by providing the connection details in the `KG +Settings` dialog. + +## Background + +For the demonstration purposes of this vignette, we include a demo KG based on +an open-source dataset of crime statistics in Manchester, because it allows us +to redistribute the KG due to its small size and public domain licence, and +because it is easily understood. This is the schema of the KG: + +```mermaid +graph LR; + Person(:Person) -- KNOWS --> Person + Person -- FAMILY_REL --> Person + Person -- LIVES_AT --> Location(:Location) + Person -- PARTY_TO --> Crime(:Crime) + Person -- MADE_CALL --> PhoneCall(:PhoneCall) + Person -- RECEIVED_CALL --> PhoneCall + Crime -- INVESTIGATED_BY --> Officer(:Officer) + Crime -- OCCURRED_AT --> Location + Object(:Object) -- INVOLVED_IN --> Crime +``` + +The KG is adapted from a [Neo4j +tutorial](https://github.com/neo4j-graph-examples/pole), and is available as a +BioCypher adapter including a BioChatter Light integration +[here](https://github.com/biocypher/pole). We also include it in an optional +BioChatter Next Docker Compose +[configuration](https://github.com/biocypher/biochatter-next/blob/main/biochatter-next/docker-compose-incl-kg.yml) +to allow trying it out locally. + +## Usage + +In BioChatter Next, we first activate the KG functionality by clicking on the +`KG Settings` button in the sidebar. In the settings dialog, we can activate the +KG functionality and select how many results we want to retrieve. Returning to +the conversation and enabling the KG functionality for the current chat +(directly above the send button), we can then ask the model about the KG. The +conversation is pasted below for convenience. + +![KG Demo](kg-demo.gif) + +In the background, the RagAgent module of BioChatter receives the question and +generates a query to retrieve the desired information. This is then passed back +to the primary model, which includes it in its answer generation. + +## Conclusion + +The native integration of BioCypher KGs into the BioChatter framework allows for +a seamless integration of KGs into the conversational AI. This in turn +facilitates knowledge accessibility in a wide range of application domains. + +## Conversation + +### Message From You +How many crimes happened in total? + +### Message From ChatGPT +According to the provided statement, the total number of crimes is 28,762. This information suggests that there have been 28,762 reported crimes in the given context. However, without additional information or context, it is difficult to determine the specific time period, location, or nature of these crimes. It's important to note that crime statistics can vary depending on the source, methodology, and reporting practices. Therefore, it is always advisable to consult official crime reports or local law enforcement agencies for the most accurate and up-to-date information on crime statistics. + +### Message From You +Where happened most crimes? + +### Message From ChatGPT +Based on the provided statement, it appears that the location with the highest number of crimes is a parking area, with a total of 811 reported crimes. However, without additional context or information, it is difficult to determine the consistency or inconsistency of this statement with other available information. It is important to note that crime rates and locations can vary significantly depending on various factors such as the region, city, neighborhood, and specific time period being considered. To provide a more comprehensive analysis, it would be helpful to have access to a broader dataset or additional information about the specific location and its crime statistics. + +### Message From You +Who were the top 3 offenders? + +### Message From ChatGPT +Based on the provided information, the top three offenders are: + +1. Phillip Williamson: According to the information given, Phillip Williamson has committed 5 crimes. + +2. Jessica Kelly: The information states that Jessica Kelly has also committed 5 crimes. + +3. Brian Morales: The data indicates that Brian Morales has committed 4 crimes. + +These three individuals, Phillip Williamson, Jessica Kelly, and Brian Morales, are the top offenders based on the number of crimes they have committed. \ No newline at end of file diff --git a/docs/vignette-rag.md b/docs/vignette-rag.md index ff8dd9ed..edbda71b 100644 --- a/docs/vignette-rag.md +++ b/docs/vignette-rag.md @@ -1,4 +1,4 @@ -# Vignette: Retrieval-Augmented Generation +# Vignette: Retrieval-Augmented Generation (RAG) This vignette demonstrates the RAG module of BioChatter as used by the BioChatter Next application. This basic use case involves an LLM manuscript @@ -76,19 +76,19 @@ database instance, free of charge at ### BioChatter RAG -#### Contextual Prompt: +#### Contextual Prompt You are an assistant to a biomedical researcher. -#### Contextual Prompt: +#### Contextual Prompt You have access to text fragments from one or multiple scientific manuscripts, which were retrieved by their semantic similarity to the user's question. -#### Contextual Prompt: +#### Contextual Prompt Your task is to help the user understand the manuscripts and their relevance to their questions. Be concise in your explanations, provide quotes from the manuscripts where appropriate, and be open about limitations in cases where the text does not contain pertinent information. -#### Message From You: +#### Message From You What is known about the CFTR protein in resistance to ovarian cancer? -#### Message From BioChatter: +#### Message From BioChatter Based on the provided information, there are several studies that suggest a potential role of the CFTR (Cystic Fibrosis Transmembrane Conductance Regulator) protein in resistance to ovarian cancer. Here are some key findings from the referenced manuscripts: 1. Gene expression profiling of ovarian tumors before and after chemotherapy revealed molecular signatures of chemoresistant tumors (M, et al., 2006). This study did not specifically mention CFTR, but it provides insights into the molecular changes associated with chemoresistance. @@ -107,10 +107,10 @@ Based on the provided information, there are several studies that suggest a pote It is important to note that while these studies suggest a potential association between CFTR and resistance to ovarian cancer, the exact mechanisms and clinical implications are still being investigated. Further research is needed to fully understand the role of CFTR in ovarian cancer resistance. -#### Message From You: +#### Message From You Tell me more about the studies of aggression and knockdown of CFTR in vivo. -#### Message From BioChatter: +#### Message From BioChatter Based on the provided information, there are two relevant studies regarding the aggression and knockdown of CFTR in ovarian cancer: 1. The study by Wu et al. (2013) investigated the constitutive activation of nuclear factor κB (NF-κB) and its contribution to CFTR expression, cancer progression, and poor prognosis in cervical cancer. Although this study is not specific to ovarian cancer, it provides insights into the role of CFTR and NF-κB in promoting cancer progression. diff --git a/mkdocs.yml b/mkdocs.yml index 26510ee9..58054864 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -57,7 +57,8 @@ nav: - Overview: benchmark-overview.md - All Results: benchmark-results.md - Vignettes: - - Retrieval-Augmented Generation: vignette-rag.md + - Retrieval-Augmented Generation (RAG): vignette-rag.md + - Knowledge Graph RAG: vignette-kg.md - API Reference: - LLM Connectivity: llm_connect-reference.md - Vectorstore: vectorstore-reference.md @@ -82,6 +83,11 @@ markdown_extensions: slugify: !!python/object/apply:pymdownx.slugs.slugify kwds: case: lower + - pymdownx.superfences: + custom_fences: + - name: mermaid + class: mermaid + format: !!python/name:pymdownx.superfences.fence_code_format extra_javascript: - https://unpkg.com/tablesort@5.3.0/dist/tablesort.min.js