Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline: Ingestion pipeline #96

Merged
merged 181 commits into from
Jun 21, 2024
Merged
Show file tree
Hide file tree
Changes from 180 commits
Commits
Show all changes
181 commits
Select commit Hold shift + click to select a range
403f118
Add new pipeline DTOs
MichaelOwenDyer Feb 14, 2024
e7c74f2
Apply autoformatter
MichaelOwenDyer Feb 14, 2024
26e86ac
Have DTOs extend BaseModel
MichaelOwenDyer Feb 15, 2024
6997315
Add data package
MichaelOwenDyer Feb 15, 2024
128ea40
update retrieval interface and requirements
yassinsws Feb 19, 2024
0818109
Merge branch 'main' into feature/datastore
yassinsws Feb 20, 2024
70ed83f
Use cloud cluster for weaviate for now for the hackathon
yassinsws Feb 21, 2024
7acf809
Merge remote-tracking branch 'origin/feature/datastore' into feature/…
yassinsws Feb 21, 2024
2c0793a
fix splitting function.
yassinsws Feb 21, 2024
b4cb05d
Add content_service, data ingester and vector repository subsystems
yassinsws Feb 22, 2024
2f3882f
Merge branch 'main' into feature/datastore
yassinsws Feb 22, 2024
05490f2
fix lintin
yassinsws Feb 22, 2024
e08aac6
Merge remote-tracking branch 'origin/feature/datastore' into feature/…
yassinsws Feb 22, 2024
a29a44b
add a return statement to unzip
yassinsws Feb 22, 2024
0c96395
Add image recognition for Ollama, GPT4V and image generation for Dall-E
Hialus Mar 6, 2024
e9874b9
Solved requirements problem ( removed olama for now as weaviate needs…
yassinsws Mar 17, 2024
3a186c9
fixed requirements file
yassinsws Mar 17, 2024
379550b
fixed message interpretation function in the llm class
yassinsws Mar 17, 2024
0f6e576
Added detail parameter to image_interpretation model
yassinsws Mar 18, 2024
a4186c3
renamed pyris_image to iris_image
yassinsws Mar 18, 2024
9f2848e
Update app/content_service/Ingestion/lectures_ingestion.py
yassinsws Mar 18, 2024
224a701
Update app/content_service/Retrieval/abstract_retrieval.py
yassinsws Mar 18, 2024
93a2f44
Update app/content_service/Ingestion/repository_ingestion.py
yassinsws Mar 18, 2024
bca6377
Update app/content_service/Ingestion/lectures_ingestion.py
yassinsws Mar 18, 2024
6e9525d
Update app/content_service/Retrieval/lecture_retrieval.py
yassinsws Mar 18, 2024
bc97236
erase old lecture download files
yassinsws Mar 18, 2024
b0291b1
Add a function to get lectures from Artemis
yassinsws Mar 18, 2024
7211386
Update app/content_service/get_lecture_from_artemis.py
yassinsws Mar 24, 2024
738e7a0
refractor tutor pipeline
yassinsws Mar 24, 2024
303f6d4
Lecture content first draft ready for review
yassinsws Mar 25, 2024
1dfd03a
Black
yassinsws Mar 25, 2024
f3b3c93
Black prompt and update database link
yassinsws Mar 29, 2024
34bc567
Lecture chat pipeline works just fine
yassinsws Mar 31, 2024
67d6bd0
Black
yassinsws Mar 31, 2024
2fe64ab
flake8
yassinsws Mar 31, 2024
4ef1672
Add Image support to our llm
yassinsws Mar 31, 2024
d15c6e6
flake8
yassinsws Mar 31, 2024
0f57336
black
yassinsws Mar 31, 2024
bcc54c2
black
yassinsws Mar 31, 2024
22a96ab
Added method to delete objects and collections from the data base, ad…
yassinsws Apr 1, 2024
981a453
Initial commit for the ingestion pipeline
yassinsws Apr 1, 2024
a0b1676
Merge branch 'main' into feature/llm/image_support
Hialus Apr 2, 2024
1b372ec
Merge remote-tracking branch 'origin/feature/llm/image_support' into …
yassinsws Apr 3, 2024
7c48731
black
yassinsws Apr 3, 2024
5c94f8d
Integrate Lecture Pipeline and Tutor chat Pipeline
yassinsws Apr 6, 2024
5261e94
Requirements cannot work with ollama ( version too old )
yassinsws Apr 6, 2024
67226d2
Merge branch 'main' of https://github.com/ls1intum/Pyris into feature…
yassinsws Apr 6, 2024
deaebf0
Merge branch 'feature/Lecture-Pipeline' of https://github.com/ls1intu…
yassinsws Apr 7, 2024
ea7291c
Save Work, ingestion is implemented
yassinsws Apr 7, 2024
0cbc8ca
lecture ingestion Pipeline implemented and ready for review
yassinsws Apr 7, 2024
fa4e705
there is no image support in completion
yassinsws Apr 7, 2024
53edf86
Fix Linters
yassinsws Apr 7, 2024
57b0d72
Update app/content_service/Retrieval/abstract_retrieval.py
yassinsws Apr 7, 2024
f567809
fixed issues with ingestion pipeline
yassinsws Apr 22, 2024
f8aef7e
Fix linter
yassinsws Apr 22, 2024
7a6270b
Fix linter
yassinsws Apr 22, 2024
bc69e96
Fix Ingestion Pipeline, ready for review
yassinsws Apr 22, 2024
bea9fcf
Ingestion Pipeline tested with a new instance of the vector database
yassinsws Apr 23, 2024
12b33f9
change the database
yassinsws Apr 24, 2024
d46f7e7
Merge remote-tracking branch 'origin/main' into feature/Ingestion_pip…
yassinsws Apr 24, 2024
b4acb1d
Solve merge Conflict and update Pr
yassinsws Apr 25, 2024
412d5a7
Update Image Support
yassinsws Apr 25, 2024
2ac1d48
Merge branch 'main' into feature/llm/image_support
yassinsws Apr 25, 2024
58ac585
Fix Requirements, ollama should be deleted because it's using an old …
yassinsws Apr 25, 2024
892bd6f
Merge With Latest version of main
yassinsws Apr 25, 2024
e05206c
Merge With Latest version of main
yassinsws Apr 25, 2024
1ca6b8e
Merge remote-tracking branch 'origin/main' into feature/Ingestion_pip…
yassinsws Apr 25, 2024
c72f8ad
Merge remote-tracking branch 'origin/feature/llm/image_support' into …
yassinsws Apr 25, 2024
5d29c93
Fix Warning
yassinsws Apr 25, 2024
2e09692
Readjusted the image generation and recognition PR
yassinsws Apr 26, 2024
fe76c80
Image interpretation tested works fine
yassinsws Apr 26, 2024
ec964c3
Black
yassinsws Apr 26, 2024
f961717
Merge remote-tracking branch 'origin/feature/llm/image_support' into …
yassinsws Apr 27, 2024
1171b25
Update app/content_service/Retrieval/repositories_retrieval.py
yassinsws Apr 27, 2024
ab73df5
Update app/llm/external/openai_dalle.py
yassinsws Apr 27, 2024
c7518ee
Added status update and delete data from database
yassinsws Apr 28, 2024
1bd1b85
black
yassinsws Apr 28, 2024
764931e
Skip was not working when the Stages are done
yassinsws May 1, 2024
6c60225
Update code
yassinsws May 3, 2024
a9c77c1
Update code
yassinsws May 3, 2024
69c791a
Flake8
yassinsws May 3, 2024
c7f53ee
Update and merge main with datastore PR
yassinsws May 3, 2024
aa247b8
Erase drafts of lecture_ingestion and repository_ingestion, because i…
yassinsws May 3, 2024
4dd3b3d
refractor code
yassinsws May 3, 2024
f06e884
refractor code
yassinsws May 3, 2024
008a9e5
refractor code
yassinsws May 3, 2024
b4d0120
Merge
yassinsws May 3, 2024
42ce267
Black Flake8
yassinsws May 3, 2024
fcbddc5
return get client
yassinsws May 3, 2024
4bd9cd2
implement request changes
yassinsws May 4, 2024
7021ba5
implement request changes
yassinsws May 5, 2024
bc75592
modify lecute_unit_dto
yassinsws May 5, 2024
0ac2712
make class into enum
yassinsws May 5, 2024
b50ea25
make class into enum
yassinsws May 5, 2024
dfa3063
merge datastore pr changes
yassinsws May 5, 2024
eb5f446
merge datastore pr changes
yassinsws May 5, 2024
586aa1e
Merge branch 'main' into feature/datastore
yassinsws May 5, 2024
5133adc
Clean PR
yassinsws May 5, 2024
5abd811
Clean PR
yassinsws May 5, 2024
3e77483
add typed dict
yassinsws May 5, 2024
1da1d5e
add typed dict
yassinsws May 5, 2024
4699fed
Erase content_service
yassinsws May 5, 2024
ea32c7b
Erase content_service
yassinsws May 5, 2024
6bc383a
fix lecture_schema
yassinsws May 5, 2024
19e2c6c
fix lecture_schema
yassinsws May 5, 2024
b0e6f1d
fix lecture_schema
yassinsws May 5, 2024
f56c288
fix status update bug
yassinsws May 6, 2024
2e25f9d
fix status update bug
yassinsws May 6, 2024
db9d67a
MERGE MAIN AND DATASTORE PIPELINE
yassinsws May 6, 2024
dfdb5e5
MERGE MAIN AND DATASTORE PIPELINE
yassinsws May 6, 2024
719aec2
MERGE MAIN AND DATASTORE PIPELINE
yassinsws May 6, 2024
27c91d7
Add an exponential Backoff window for the embeddings, to get past the…
yassinsws May 6, 2024
9a50a79
Add an exponential Backoff window for the embeddings, to get past the…
yassinsws May 6, 2024
d9dd8bc
Add an exponential Backoff window for the embeddings, to get past the…
yassinsws May 6, 2024
9a1679d
Add an exponential Backoff window for the embeddings, to get past the…
yassinsws May 6, 2024
53b13d8
Add classes used in code
yassinsws May 7, 2024
1b477ff
replace import all classes only with the classes needed
yassinsws May 7, 2024
dca1493
replace import all classes only with the classes needed
yassinsws May 7, 2024
fd11add
Merge branch 'main' into feature/datastore
yassinsws May 7, 2024
3930890
Update requirements.txt
yassinsws May 7, 2024
5a70b5c
rename db to database
yassinsws May 7, 2024
ef1b4ef
Merge remote-tracking branch 'origin/feature/datastore' into feature/…
yassinsws May 7, 2024
c9587c5
Make batch import done by only one thread.
yassinsws May 7, 2024
7b40f7c
Change the way we add image interpretation to the ingestion
yassinsws May 10, 2024
8e3d710
Change the way we add image interpretation to the ingestion
yassinsws May 10, 2024
a8a3c4f
Merge remote-tracking branch 'origin/main' into feature/Ingestion_pip…
yassinsws May 20, 2024
b410cf3
Minor Changes in ingestion pipeline
yassinsws May 26, 2024
32f33be
Minor Changes in ingestion pipeline
yassinsws May 26, 2024
6062f39
return the Basic request handler because needed for ingestion
yassinsws May 26, 2024
67c3ee6
fix wrong use of prompt param
bassner May 26, 2024
d1e9874
fix formatting
bassner May 26, 2024
0eadd32
use gpt 3 for lang detection
bassner May 26, 2024
d37851a
remove redundant alias
bassner May 26, 2024
f4c8c3c
merge and update course language method
yassinsws May 26, 2024
46e157a
merge and update course language method
yassinsws May 26, 2024
b89703e
minor changes on message_content_dto and image generation
yassinsws May 26, 2024
730305f
Correct change llm to 3.5
yassinsws May 26, 2024
0f354ae
Apply suggestions from code review
bassner May 27, 2024
190587c
Add futur Todos
yassinsws May 27, 2024
4751070
Merge remote-tracking branch 'origin/feature/Ingestion_pipeline' into…
yassinsws May 27, 2024
dd9fed4
Black
yassinsws May 27, 2024
dbaef87
Black
yassinsws May 27, 2024
e620fcb
requirements.txt merge
yassinsws May 27, 2024
f5f1738
remove print statements
yassinsws May 27, 2024
d9c458e
Add local weaviate database with docker file
yassinsws May 27, 2024
ed7b2e1
Linters
yassinsws May 27, 2024
dd21a1a
Add base_url field to ingestion
yassinsws May 30, 2024
ca35123
Add timor Review
yassinsws May 30, 2024
50bd537
Linter
yassinsws May 30, 2024
abdbc21
change 8000 to 8001
yassinsws May 31, 2024
7728d03
TIMOR REVIEW
yassinsws May 31, 2024
753cdbb
env variable from application.yml
yassinsws May 31, 2024
7a80e5f
timor last comment, make volume configurable with an environment vari…
yassinsws Jun 1, 2024
f854fd4
timor last comment, make volume configurable with an environment vari…
yassinsws Jun 1, 2024
6192932
Linter
yassinsws Jun 1, 2024
8b1345f
make fields that can be null on artemis optional
yassinsws Jun 3, 2024
ed51977
black
yassinsws Jun 3, 2024
ccbffbb
Get weaviate production ready
yassinsws Jun 4, 2024
b89a58f
Get weaviate production ready
yassinsws Jun 4, 2024
5f6b8b7
Get weaviate production ready
yassinsws Jun 4, 2024
b0df9f2
Merge branch 'main' into feature/Ingestion_pipeline
yassinsws Jun 7, 2024
58258c9
Light-Weight Ingestion Pipeline (#119)
yassinsws Jun 7, 2024
092a0ec
Update example_application.yml
kaancayli Jun 10, 2024
d22c856
Max chunks get extended
yassinsws Jun 11, 2024
ca11b9e
Merge branch 'main' into feature/Ingestion_pipeline
yassinsws Jun 11, 2024
16e0e49
Merge with ingestion Pipeline
yassinsws Jun 11, 2024
930050d
Fix typo in pyris-dev
yassinsws Jun 11, 2024
d503ba2
flake8
yassinsws Jun 11, 2024
e4c7575
Ingestion with images
yassinsws Jun 13, 2024
a6b70ac
Ingestion with images
yassinsws Jun 13, 2024
e5b34f0
keep error handling
yassinsws Jun 13, 2024
c60921b
keep error handling
yassinsws Jun 13, 2024
9e34fc2
erase full ingestion checks for now
yassinsws Jun 13, 2024
28d99cf
erase full ingestion checks for now
yassinsws Jun 13, 2024
4843299
Add log when course finished
yassinsws Jun 13, 2024
dc205ed
Add log when course finished
yassinsws Jun 13, 2024
847913d
erase unsused import
yassinsws Jun 13, 2024
692a10a
Merge branch 'main' into feature/Ingestion_pipeline
bassner Jun 13, 2024
d5d926e
Add check if course was ingested and only use the lecture pipeline if…
yassinsws Jun 15, 2024
5732e7c
make chunks size 512
yassinsws Jun 18, 2024
cba0561
give only 5 best chunks to tutor pipeline
yassinsws Jun 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@
application.local.yml
llm_config.local.yml

######################
# Docker
######################
/docker/.docker-data/artemis-data/*
!/docker/.docker-data/artemis-data/.gitkeep

########################
# Auto-generated rules #
Expand Down
7 changes: 7 additions & 0 deletions app/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,16 @@ class APIKeyConfig(BaseModel):
token: str


class WeaviateSettings(BaseModel):
host: str
port: int
grpc_port: int


class Settings(BaseModel):
api_keys: list[APIKeyConfig]
env_vars: dict[str, str]
weaviate: WeaviateSettings

@classmethod
def get_settings(cls):
Expand Down
7 changes: 4 additions & 3 deletions app/domain/data/image_message_content_dto.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from pydantic import BaseModel
from pydantic import BaseModel, Field, ConfigDict
from typing import Optional


class ImageMessageContentDTO(BaseModel):
base64: str
prompt: Optional[str]
base64: str = Field(..., alias="pdfFile")
prompt: Optional[str] = None
model_config = ConfigDict(populate_by_name=True)
11 changes: 6 additions & 5 deletions app/domain/data/lecture_unit_dto.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@

class LectureUnitDTO(BaseModel):
to_update: bool = Field(alias="toUpdate")
pdf_file_base64: str = Field(alias="pdfFile")
base_url: str = Field(alias="artemisBaseUrl")
pdf_file_base64: str = Field(default="", alias="pdfFile")
lecture_unit_id: int = Field(alias="lectureUnitId")
lecture_unit_name: str = Field(alias="lectureUnitName")
lecture_unit_name: str = Field(default="", alias="lectureUnitName")
lecture_id: int = Field(alias="lectureId")
lecture_name: str = Field(alias="lectureName")
lecture_name: str = Field(default="", alias="lectureName")
course_id: int = Field(alias="courseId")
course_name: str = Field(alias="courseName")
course_description: str = Field(alias="courseDescription")
course_name: str = Field(default="", alias="courseName")
course_description: str = Field(default="", alias="courseDescription")
Empty file.
12 changes: 12 additions & 0 deletions app/domain/ingestion/ingestion_pipeline_execution_dto.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
from typing import List

from pydantic import Field

from app.domain import PipelineExecutionDTO
from app.domain.data.lecture_unit_dto import LectureUnitDTO


class IngestionPipelineExecutionDto(PipelineExecutionDTO):
lecture_units: List[LectureUnitDTO] = Field(
..., alias="pyrisLectureUnitWebhookDTOS"
)
7 changes: 7 additions & 0 deletions app/domain/ingestion/ingestion_status_update_dto.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from typing import Optional

from ...domain.status.status_update_dto import StatusUpdateDTO


class IngestionStatusUpdateDTO(StatusUpdateDTO):
result: Optional[str] = None
4 changes: 3 additions & 1 deletion app/domain/pipeline_execution_settings_dto.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,7 @@

class PipelineExecutionSettingsDTO(BaseModel):
authentication_token: str = Field(alias="authenticationToken")
allowed_model_identifiers: List[str] = Field(alias="allowedModelIdentifiers")
allowed_model_identifiers: List[str] = Field(
default=[], alias="allowedModelIdentifiers"
)
artemis_base_url: str = Field(alias="artemisBaseUrl")
14 changes: 0 additions & 14 deletions app/ingestion/abstract_ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,3 @@ def chunk_data(self, path: str) -> List[Dict[str, str]]:
Abstract method to chunk code files in the root directory.
"""
pass

@abstractmethod
def ingest(self, path: str) -> bool:
"""
Abstract method to ingest repositories into the database.
"""
pass

@abstractmethod
def update(self, path: str):
"""
Abstract method to update a repository in the database.
"""
pass
40 changes: 30 additions & 10 deletions app/llm/external/openai_chat.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import logging
import time
from datetime import datetime
from typing import Literal, Any

Expand Down Expand Up @@ -78,16 +80,34 @@ def chat(
self, messages: list[PyrisMessage], arguments: CompletionArguments
) -> PyrisMessage:
# noinspection PyTypeChecker
response = self._client.chat.completions.create(
model=self.model,
messages=convert_to_open_ai_messages(messages),
temperature=arguments.temperature,
max_tokens=arguments.max_tokens,
response_format=ResponseFormat(
type=("json_object" if arguments.response_format == "JSON" else "text")
),
)
return convert_to_iris_message(response.choices[0].message)
retries = 10
backoff_factor = 2
initial_delay = 1

for attempt in range(retries):
try:
if arguments.response_format == "JSON":
response = self._client.chat.completions.create(
model=self.model,
messages=convert_to_open_ai_messages(messages),
temperature=arguments.temperature,
max_tokens=arguments.max_tokens,
response_format=ResponseFormat(type="json_object"),
)
else:
response = self._client.chat.completions.create(
model=self.model,
messages=convert_to_open_ai_messages(messages),
temperature=arguments.temperature,
max_tokens=arguments.max_tokens,
)
return convert_to_iris_message(response.choices[0].message)
except Exception as e:
wait_time = initial_delay * (backoff_factor**attempt)
logging.warning(f"Exception on attempt {attempt + 1}: {e}")
logging.info(f"Retrying in {wait_time} seconds...")
time.sleep(wait_time)
logging.error("Failed to interpret image after several attempts.")


class DirectOpenAIChatModel(OpenAIChatModel):
Expand Down
27 changes: 22 additions & 5 deletions app/llm/external/openai_embeddings.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
import logging
from typing import Literal, Any
from openai import OpenAI
from openai.lib.azure import AzureOpenAI

from ...llm.external.model import EmbeddingModel
import time


class OpenAIEmbeddingModel(EmbeddingModel):
Expand All @@ -11,12 +13,27 @@ class OpenAIEmbeddingModel(EmbeddingModel):
_client: OpenAI

def embed(self, text: str) -> list[float]:
response = self._client.embeddings.create(
model=self.model,
input=text,
encoding_format="float",
retries = 10
backoff_factor = 2
initial_delay = 1

for attempt in range(retries):
try:
response = self._client.embeddings.create(
model=self.model,
input=text,
encoding_format="float",
)
return response.data[0].embedding
except Exception as e:
wait_time = initial_delay * (backoff_factor**attempt)
logging.warning(f"Rate limit exceeded on attempt {attempt + 1}: {e}")
logging.info(f"Retrying in {wait_time} seconds...")
time.sleep(wait_time)
logging.error(
"Failed to get embedding after several attempts due to rate limit."
)
return response.data[0].embedding
return []


class DirectOpenAIEmbeddingModel(OpenAIEmbeddingModel):
Expand Down
1 change: 1 addition & 0 deletions app/llm/request_handler/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from ..request_handler.request_handler_interface import RequestHandler
from ..request_handler.basic_request_handler import BasicRequestHandler

from ..request_handler.capability_request_handler import (
CapabilityRequestHandler,
CapabilityRequestHandlerSelectionMode,
Expand Down
14 changes: 11 additions & 3 deletions app/pipeline/chat/lecture_chat_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
)
from langchain_core.runnables import Runnable

from ..shared.citation_pipeline import CitationPipeline
from ...common import convert_iris_message_to_langchain_message
from ...domain import PyrisMessage
from ...llm import CapabilityRequestHandler, RequirementList
Expand Down Expand Up @@ -42,7 +43,8 @@ def lecture_initial_prompt():
questions about the lectures. To answer them the best way, relevant lecture content is provided to you with the
student's question. If the context provided to you is not enough to formulate an answer to the student question
you can simply ask the student to elaborate more on his question. Use only the parts of the context provided for
you that is relevant to the student's question. """
you that is relevant to the student's question. If the user greets you greet him back, and ask him how you can help
"""


class LectureChatPipeline(Pipeline):
Expand All @@ -60,14 +62,15 @@ def __init__(self):
privacy_compliance=True,
)
)
completion_args = CompletionArguments(temperature=0.2, max_tokens=2000)
completion_args = CompletionArguments(temperature=0, max_tokens=2000)
self.llm = IrisLangchainChatModel(
request_handler=request_handler, completion_args=completion_args
)
# Create the pipelines
self.db = VectorDatabase()
self.retriever = LectureRetrieval(self.db.client)
self.pipeline = self.llm | StrOutputParser()
self.citation_pipeline = CitationPipeline()

def __repr__(self):
return f"{self.__class__.__name__}(llm={self.llm})"
Expand Down Expand Up @@ -98,15 +101,20 @@ def __call__(self, dto: LectureChatPipelineExecutionDTO):
student_query=query.contents[0].text_content,
result_limit=10,
course_name=dto.course.name,
course_id=dto.course.id,
base_url=dto.settings.artemis_base_url,
)

self._add_relevant_chunks_to_prompt(retrieved_lecture_chunks)
prompt_val = self.prompt.format_messages()
self.prompt = ChatPromptTemplate.from_messages(prompt_val)
try:
response = (self.prompt | self.pipeline).invoke({})
response_with_citation = self.citation_pipeline(
retrieved_lecture_chunks, response
)
logger.info(f"Response from lecture chat pipeline: {response}")
return response
return response_with_citation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider specifying exception types to improve error diagnostics and handling.

- except Exception as e:
+ except (WeaviateException, IOError) as e:

Committable suggestion was skipped due to low confidence.

except Exception as e:
raise e

Expand Down
80 changes: 56 additions & 24 deletions app/pipeline/chat/tutor_chat_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
PromptTemplate,
)
from langchain_core.runnables import Runnable
from weaviate.collections.classes.filters import Filter

from .lecture_chat_pipeline import LectureChatPipeline
from .output_models.output_models.selected_paragraphs import SelectedParagraphs
Expand Down Expand Up @@ -86,19 +87,27 @@ def __call__(self, dto: TutorChatPipelineExecutionDTO, **kwargs):
:param dto: execution data transfer object
:param kwargs: The keyword arguments
"""
execution_dto = LectureChatPipelineExecutionDTO(
settings=dto.settings, course=dto.course, chatHistory=dto.chat_history
)
lecture_chat_thread = threading.Thread(
target=self._run_lecture_chat_pipeline(execution_dto), args=(dto,)
)
tutor_chat_thread = threading.Thread(
target=self._run_tutor_chat_pipeline(dto), args=(dto,)
)
lecture_chat_thread.start()
tutor_chat_thread.start()

try:
should_execute_lecture_pipeline = self.should_execute_lecture_pipeline(
dto.course.id
)
self.lecture_chat_response = ""
if should_execute_lecture_pipeline:
execution_dto = LectureChatPipelineExecutionDTO(
settings=dto.settings,
course=dto.course,
chatHistory=dto.chat_history,
)
lecture_chat_thread = threading.Thread(
target=self._run_lecture_chat_pipeline(execution_dto), args=(dto,)
)
lecture_chat_thread.start()

tutor_chat_thread = threading.Thread(
target=self._run_tutor_chat_pipeline(dto),
args=(dto, should_execute_lecture_pipeline),
)
tutor_chat_thread.start()
Comment on lines +91 to +110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure proper thread management and exception handling in threading logic.

The threading logic within the __call__ method does not appear to handle exceptions or race conditions effectively. Consider implementing thread joining or using a thread pool to manage threads more safely. Additionally, ensure that shared resources are accessed in a thread-safe manner. Here is a proposed change:

- lecture_chat_thread = threading.Thread(
-     target=self._run_lecture_chat_pipeline(execution_dto), args=(dto,)
- )
- lecture_chat_thread.start()
+ with ThreadPoolExecutor() as executor:
+     executor.submit(self._run_lecture_chat_pipeline, execution_dto, dto)
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
should_execute_lecture_pipeline = self.should_execute_lecture_pipeline(
dto.course.id
)
self.lecture_chat_response = ""
if should_execute_lecture_pipeline:
execution_dto = LectureChatPipelineExecutionDTO(
settings=dto.settings,
course=dto.course,
chatHistory=dto.chat_history,
)
lecture_chat_thread = threading.Thread(
target=self._run_lecture_chat_pipeline(execution_dto), args=(dto,)
)
lecture_chat_thread.start()
tutor_chat_thread = threading.Thread(
target=self._run_tutor_chat_pipeline(dto),
args=(dto, should_execute_lecture_pipeline),
)
tutor_chat_thread.start()
should_execute_lecture_pipeline = self.should_execute_lecture_pipeline(
dto.course.id
)
self.lecture_chat_response = ""
if should_execute_lecture_pipeline:
execution_dto = LectureChatPipelineExecutionDTO(
settings=dto.settings,
course=dto.course,
chatHistory=dto.chat_history,
)
with ThreadPoolExecutor() as executor:
executor.submit(self._run_lecture_chat_pipeline, execution_dto, dto)
tutor_chat_thread = threading.Thread(
target=self._run_tutor_chat_pipeline(dto),
args=(dto, should_execute_lecture_pipeline),
)
tutor_chat_thread.start()

response = self.choose_best_response(
[self.tutor_chat_response, self.lecture_chat_response],
dto.chat_history[-1].contents[0].text_content,
Expand All @@ -107,7 +116,6 @@ def __call__(self, dto: TutorChatPipelineExecutionDTO, **kwargs):
logger.info(f"Response from tutor chat pipeline: {response}")
self.callback.done("Generated response", final_result=response)
except Exception as e:
print(e)
self.callback.error(f"Failed to generate response: {e}")

def choose_best_response(
Expand Down Expand Up @@ -152,7 +160,11 @@ def _run_lecture_chat_pipeline(self, dto: LectureChatPipelineExecutionDTO):
pipeline = LectureChatPipeline()
self.lecture_chat_response = pipeline(dto=dto)

def _run_tutor_chat_pipeline(self, dto: TutorChatPipelineExecutionDTO):
def _run_tutor_chat_pipeline(
self,
dto: TutorChatPipelineExecutionDTO,
should_execute_lecture_pipeline: bool = False,
):
"""
Runs the pipeline
:param dto: execution data transfer object
Expand Down Expand Up @@ -208,16 +220,18 @@ def _run_tutor_chat_pipeline(self, dto: TutorChatPipelineExecutionDTO):
submission,
selected_files,
)

retrieved_lecture_chunks = self.retriever(
chat_history=history,
student_query=query.contents[0].text_content,
result_limit=10,
course_name=dto.course.name,
problem_statement=problem_statement,
exercise_title=exercise_title,
)
self._add_relevant_chunks_to_prompt(retrieved_lecture_chunks)
if should_execute_lecture_pipeline:
retrieved_lecture_chunks = self.retriever(
chat_history=history,
student_query=query.contents[0].text_content,
result_limit=10,
course_name=dto.course.name,
problem_statement=problem_statement,
exercise_title=exercise_title,
course_id=dto.course.id,
base_url=dto.settings.artemis_base_url,
)
self._add_relevant_chunks_to_prompt(retrieved_lecture_chunks)

self.callback.in_progress("Generating response...")

Expand Down Expand Up @@ -360,3 +374,21 @@ def _add_relevant_chunks_to_prompt(self, retrieved_lecture_chunks: List[dict]):
self.prompt += SystemMessagePromptTemplate.from_template(
"USE ONLY THE CONTENT YOU NEED TO ANSWER THE QUESTION:\n"
)

def should_execute_lecture_pipeline(self, course_id: int) -> bool:
"""
Checks if the lecture pipeline should be executed
:param course_id: The course ID
:return: True if the lecture pipeline should be executed
"""
if course_id:
# Fetch the first object that matches the course ID with the language property
result = self.db.lectures.query.fetch_objects(
filters=Filter.by_property(LectureSchema.COURSE_ID.value).equal(
course_id
),
limit=1,
return_properties=[LectureSchema.COURSE_NAME.value],
)
return len(result.objects) > 0
return False
Comment on lines +378 to +394
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize the method should_execute_lecture_pipeline.

The method should_execute_lecture_pipeline can be optimized by directly returning the condition in the if statement. This change makes the function more concise and easier to read, as previously suggested:

- if course_id:
-     result = self.db.lectures.query.fetch_objects(
-         filters=Filter.by_property(LectureSchema.COURSE_ID.value).equal(course_id),
-         limit=1,
-         return_properties=[LectureSchema.COURSE_NAME.value],
-     )
-     return len(result.objects) > 0
- return False
+ return course_id and len(self.db.lectures.query.fetch_objects(
+     filters=Filter.by_property(LectureSchema.COURSE_ID.value).equal(course_id),
+     limit=1,
+     return_properties=[LectureSchema.COURSE_NAME.value],
+ ).objects) > 0
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def should_execute_lecture_pipeline(self, course_id: int) -> bool:
"""
Checks if the lecture pipeline should be executed
:param course_id: The course ID
:return: True if the lecture pipeline should be executed
"""
if course_id:
# Fetch the first object that matches the course ID with the language property
result = self.db.lectures.query.fetch_objects(
filters=Filter.by_property(LectureSchema.COURSE_ID.value).equal(
course_id
),
limit=1,
return_properties=[LectureSchema.COURSE_NAME.value],
)
return len(result.objects) > 0
return False
def should_execute_lecture_pipeline(self, course_id: int) -> bool:
"""
Checks if the lecture pipeline should be executed
:param course_id: The course ID
:return: True if the lecture pipeline should be executed
"""
return course_id and len(self.db.lectures.query.fetch_objects(
filters=Filter.by_property(LectureSchema.COURSE_ID.value).equal(
course_id
),
limit=1,
return_properties=[LectureSchema.COURSE_NAME.value],
).objects) > 0

Loading
Loading