- Neo4j database stores the network graph.
- There are 3 types of nodes - Movie, Genre, Person
- Person can be - Actor, Director, Writer or a combination of these
- Relationships are -
- Person - [ WROTE / ACTED_IN / DIRECTED ] -> Movie
- Movie - [ BELONGS_TO ] -> Genre
- LangChain and Gemini are used for the pipeline, which
- Processes the natural language prompt
- Generates a Cypher query
- Queries the Neo4j database with generated query and gets back the result in JSON
- Parses the JSON and responds back in natural language
- Clone the repo and navigate to the diretory.
- Download the dataset, rename and move it to the
/data
directory asimdb.csv
. - Create a virtual environment with Python version 3.10.14, install the requirements from
requirements.txt
. For Conda,$ conda create -c conda-forge --name <env> --file requirements.txt
- Recommended - Create new Neo4j database. (for Community edition)
- Start the Neo4j server.
- Fill in Neo4j credentials and Gemini API key in
.env_template
and rename to.env
. - First create the network graph by running the Jupyter Notebook
./src/Knowledge_Graph.ipynb
. - Run the Jupyter Notebook
./src/Graph_RAG.ipynb
. - Run the Streamlit web app by running
streamlit run ./src/App.py