flowchart TD
A[Start: Auth] --> B{User Action}
B --> |Document Upload| C1[Document Uploaded]
B --> |Text prompt| D1[User Prompt]
subgraph Atlas High Level Architecture
subgraph Feature1 [ ]
C1 --> C2[Document Storage]
C2 --> C3{{Document Parsing}}
C3 --> C4{{Chunk Embedding}}
C4 --> C5[(Chunk Indexing)]
end
subgraph Feature2 [ ]
D1 --> D2{{Document Embedding}}
D2 --> D3[(Semantic Search)]
D2 --> D4[(Keyword Search)]
D2 --> D5{{Web Crawl}}
D3 --> D6{{Re-ranking Results}}
D4 --> D6
D5 --> D6
D6 --> D7{{LLM Inference}}
end
end
This project is a web-based application built using Next.js and various API integrations including Cohere, OpenAI, Pinecone, and more. Below are the steps to set up the project, configure environment variables, and run it locally.
Before you begin, ensure you have the following installed:
- Node.js (14.x or higher)
- npm or yarn (package managers)
- Git
- MongoDB (local or cloud instance)
- Vercel CLI (optional, but recommended for environment variable management)
-
Clone the repository from your fork:
git clone https://github.com/athrael-soju/atlas-v0.1.git cd atlas-v0.1
-
Install dependencies:
npm install # or yarn install
-
Set up the environment variables (see below for required variables).
-
Make requests to the API endpoints in order to create the assistants and update the database. You can use postman or curl to make these requests. The following are the endpoints to use:
The project requires several environment variables to be set in a .env
file. Below is a description of the variables you need to configure:
AUTH_SECRET
: A secret key for authentication.NEXTAUTH_SECRET
: A secret used for NextAuth.js.
CARTESIA_API_KEY
: The API key for Cartesia services.COHERE_API_KEY
: The API key for Cohere language models.GROQ_API_KEY
: The API key for Groq services.OPENAI_API_KEY
: The API key for OpenAI GPT models.PINECONE_API
: The API key for Pinecone vector database.UNSTRUCTURED_API
: The API key for Unstructured parsing services.
COHERE_API_MODEL
: The Cohere model to be used (e.g.,rerank-multilingual-v3.0
).COHERE_RELEVANCE_THRESHOLD
: Threshold value for Cohere's relevance.
PINECONE_INDEX
: The Pinecone index to be used.NEXT_PUBLIC_PINECONE_TOPK
: The top K results returned by Pinecone queries.
MONGODB_URI
: The connection string for MongoDB, either local or remote.
NEXTAUTH_URL
: The base URL for the application (e.g.,http://localhost:3000
or your deployed URL).
OPENAI_API_MODEL
: The OpenAI model (e.g.,gpt-3.5-turbo
).OPENAI_API_EMBEDDING_MODEL
: The embedding model to be used for OpenAI (e.g.,text-embedding-3-large
).
NODE_ENV
: Should be set toproduction
for production environments ordevelopment
for local development.FILESYSTEM_PROVIDER
: The filesystem provider used by the application (e.g.,local
).
-
Ensure your
.env
file is properly configured with all required variables. -
Run the development server:
npm run dev # or yarn dev
The app will be running at http://localhost:3000.
You can easily deploy this project to Vercel. After setting up your Vercel account:
- Push your code to your GitHub repository.
- Link the repository to Vercel.
- Ensure the necessary environment variables are set in the Vercel dashboard under Settings > Environment Variables.
- Deploy your project.
For more details, visit the Vercel documentation.