12 Feb 14:03

a0fb000

v2.10.0 Latest

Latest

⭐️ Highlights

Improved `Pipeline.run()` Logic

The new Pipeline.run() logic fixes common pipeline issues, including exceptions, incorrect component execution, missing intermediate outputs, and premature execution of lazy variadic components. While most pipelines should remain unaffected, we recommend carefully reviewing your pipeline executions if you are using cyclic pipelines or pipelines with lazy variadic components to ensure their behavior has not changed. You can use this tool to compare the execution traces of your pipeline with the old and new logic.

`AsyncPipeline` for Async Execution

Together with the new Pipeline.run logic, AsyncPipeline enables asynchronous execution, allowing pipeline components to run concurrently whenever possible. This leads to significant speed improvements, especially for pipelines processing data in parallel branches such as hybrid retrieval setting.

Source Codes

Hybrid Retrieval

hybrid_rag_retrieval = AsyncPipeline()
hybrid_rag_retrieval.add_component("text_embedder", SentenceTransformersTextEmbedder())
hybrid_rag_retrieval.add_component("embedding_retriever", InMemoryEmbeddingRetriever(document_store=document_store))
hybrid_rag_retrieval.add_component("bm25_retriever", InMemoryBM25Retriever(document_store=document_store))

hybrid_rag_retrieval.connect("text_embedder", "embedding_retriever")
hybrid_rag_retrieval.connect("bm25_retriever", "document_joiner")
hybrid_rag_retrieval.connect("embedding_retriever", "document_joiner")

async def run_inner():
    return await hybrid_rag_retrieval.run({
      "text_embedder": {"text": query}, 
      "bm25_retriever": {"query": query}
      })

results = asyncio.run(run_inner())

Parallel Translation Pipeline

from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack import AsyncPipeline
from haystack.utils import Secret

# Create prompt builders with templates at initialization
spanish_prompt_builder = ChatPromptBuilder(template="Translate this message to Spanish: {{user_message}}")
turkish_prompt_builder = ChatPromptBuilder(template="Translate this message to Turkish: {{user_message}}")
thai_prompt_builder = ChatPromptBuilder(template="Translate this message to Thai: {{user_message}}")

# Create LLM instances
spanish_llm = OpenAIChatGenerator()
turkish_llm = OpenAIChatGenerator()
thai_llm = OpenAIChatGenerator()

# Create and configure pipeline
pipe = AsyncPipeline()

# Add components
pipe.add_component("spanish_prompt_builder", spanish_prompt_builder)
pipe.add_component("turkish_prompt_builder", turkish_prompt_builder)
pipe.add_component("thai_prompt_builder", thai_prompt_builder)

pipe.add_component("spanish_llm", spanish_llm)
pipe.add_component("turkish_llm", turkish_llm)
pipe.add_component("thai_llm", thai_llm)

# Connect components
pipe.connect("spanish_prompt_builder.prompt", "spanish_llm.messages")
pipe.connect("turkish_prompt_builder.prompt", "turkish_llm.messages")
pipe.connect("thai_prompt_builder.prompt", "thai_llm.messages")

user_message = """
In computer programming, the async/await pattern is a syntactic feature of many programming languages that 
allows an asynchronous, non-blocking function to be structured in a way similar to an ordinary synchronous function. 
It is semantically related to the concept of a coroutine and is often implemented using similar techniques, 
and is primarily intended to provide opportunities for the program to execute other code while waiting 
for a long-running, asynchronous task to complete, usually represented by promises or similar data structures.
"""

# Run the pipeline with simplified input
res = pipe.run(data={"user_message": user_message})

# Print results
print("Spanish translation:", res["spanish_llm"]["generated_messages"][0].text)
print("Turkish translation:", res["turkish_llm"]["generated_messages"][0].text)
print("Thai translation:", res["thai_llm"]["generated_messages"][0].text)

Tool Calling Support Everywhere

Tool calling is now universally supported across all chat generators, making it easier than ever for developers to port tools across different platforms. Simply switch the chat generator used, and tooling will work seamlessly without any additional configuration. This update applies across AzureOpenAIChatGenerator, HuggingFaceLocalChatGenerator, and all core integrations, including AnthropicChatGenerator, CohereChatGenerator, AmazonBedrockChatGenerator, and VertexAIGeminiChatGenerator. With this enhancement, tool usage becomes a native capability across the ecosystem, enabling more advanced and interactive agentic applications.

Visualize Your Pipelines Locally

Pipeline visualization is now more flexible, allowing users to render pipeline graphs locally without requiring an internet connection or sending data to an external service. By running a local Mermaid server with Docker, you can generate visual representations of your pipelines using draw() or show(). Learn more in Visualizing Pipelines

New Components for Smarter Document Processing

This release introduces new components that enhance document processing capabilities. CSVDocumentSplitter and CSVDocumentCleaner make handling CSV files more efficient. LLMMetadaExtractor leverages an LLM to analyze documents and enrich them with relevant metadata, improving searchability and retrieval accuracy.

⬆️ Upgrade Notes

The DOCXToDocument converter now returns a Document object with DOCX metadata stored in the meta field as a dictionary under the key docx. Previously, the metadata was represented as a DOCXMetadata dataclass. This change does not impact reading from or writing to a Document Store.
Removed the deprecated NLTKDocumentSplitter, it's functionalities are now supported by the DocumentSplitter.
The deprecated FUNCTION role has been removed from the ChatRole enum. Use TOOL instead. The deprecated class method ChatMessage.from_function has been removed. Use ChatMessage.from_tool instead.

🚀 New Features

Added a new component ListJoiner which joins lists of values from different components to a single list.

Introduced the OpenAPIConnector component, enabling direct invocation of REST endpoints as specified in an OpenAPI specification. This component is designed for direct REST endpoint invocation without LLM-generated payloads, users needs to pass the run parameters explicitly. Example:

from haystack.utils import Secret 
from haystack.components.connectors.openapi import OpenAPIConnector  

connector = OpenAPIConnector(openapi_spec="https://bit.ly/serperdev_openapi", credentials=Secret.from_env_var("SERPERDEV_API_KEY")) 
response = connector.run(operation_id="search", parameters={"q": "Who was Nikola Tesla?"} )

Adding a new component, LLMMetadaExtractor, which can be used in an indexing pipeline to extract metadata from documents based on a user-given prompt and return the documents with the metadata field with the output of the LLM.
Introduced CSVDocumentCleaner component for cleaning CSV documents.
- Removes empty rows and columns, while preserving specified ignored rows and columns.
- Customizable number of rows and columns to ignore during processing.
Introducing CSVDocumentSplitter: The CSVDocumentSplitter splits CSV documents into structured sub-tables by recursively splitting by empty rows and columns larger than a specified threshold. This is particularly useful when converting Excel files which can often have multiple tables within one sheet.

⚡️ Enhancement Notes

Enhanced SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder to accept an additional parameter, which is passed directly to the underlying SentenceTransformer.encode method for greater flexibility in embedding customization.
Added completion_start_time metadata to track time-to-first-token (TTFT) in streaming responses from Hugging Face API and OpenAI (Azure).
Enhancements to Date Filtering in MetadataRouter:
- Improved date parsing in filter utilities by introducing _parse_date, which first attempts datetime.fromisoformat(value) for backward compatibility and then falls back to dateutil.parser.parse() for broader ISO 8601 support.
- Resolved a common issue where comparing naive and timezone-aware datetimes resulted in TypeError. Added _ensure_both_dates_naive_or_aware, which ensures both datetimes are either naive or aware. If one is missing a timezone, it is assigned the timezone of the other for consistency.
When Pipeline.from_dict receives an invalid type (e.g. empty string), an informative PipelineError is now raised.
Add jsonschema library as a core dependency. It is used in Tool and JsonSc...

Assets 2

12 Feb 13:27

github-actions

v2.10.0-rc3

c3ef82b

v2.10.0-rc3 Pre-release

Pre-release

v2.10.0-rc3

Assets 2

11 Feb 13:29

github-actions

v2.10.0-rc1

0d89a9a

v2.10.0-rc1 Pre-release

Pre-release

v2.10.0-rc1

Assets 2

14 Jan 16:11

github-actions

v2.9.0

d72ed6c

v2.9.0

⭐️ Highlights

Tool Calling Support

We are introducing the Tool, a simple and unified abstraction for representing tools in Haystack, and the ToolInvoker, which executes tool calls prepared by LLMs. These features make it easy to integrate tool calling into your Haystack pipelines, enabling seamless interaction with tools when used with components like OpenAIChatGenerator and HuggingFaceAPIChatGenerator. Here's how you can use them:

def dummy_weather_function(city: str):
    return f"The weather in {city} is 20 degrees."

tool = Tool(
    name="weather_tool",
    description="A tool to get the weather",
    function=dummy_weather_function,
    parameters={
      "type": "object",
      "properties": {"city": {"type": "string"}},
      "required": ["city"],
    }
)

pipeline = Pipeline()
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini", tools=[tool]))
pipeline.add_component("tool_invoker", ToolInvoker(tools=[tool]))
pipeline.connect("llm.replies", "tool_invoker.messages")

message = ChatMessage.from_user("How is the weather in Berlin today?")
result = pipeline.run({"llm": {"messages": [message]}})

Use Components as Tools
As an abstraction of Tool, ComponentTool allows LLMs to interact directly with components like web search, document processing, or custom user components. It simplifies schema generation and type conversion, making it easy to expose complex component functionality to LLMs.

# Create a tool from the component
tool = ComponentTool(
    component=SerperDevWebSearch(api_key=Secret.from_env_var("SERPERDEV_API_KEY"), top_k=3),
    name="web_search",  # Optional: defaults to "serper_dev_web_search"
    description="Search the web for current information on any topic"  # Optional: defaults to component docstring
)

New Splitting Method: `RecursiveDocumentSplitter`

RecursiveDocumentSplitter introduces a smarter way to split text. It uses a set of separators to divide text recursively, starting with the first separator. If chunks are still larger than the specified size, the splitter moves to the next separator in the list. This approach ensures efficient and granular text splitting for improved processing.

from haystack.components.preprocessors import RecursiveDocumentSplitter

splitter = RecursiveDocumentSplitter(split_length=260, split_overlap=0, separators=["\n\n", "\n", ".", " "])
doc_chunks = splitter.run([Document(content="...")])

⚠️ Refactored `ChatMessage` dataclass

ChatMessage dataclass has been refactored to improve flexibility and compatibility. As part of this update, the content attribute has been removed and replaced with a new text property for accessing the ChatMessage's textual value. This change ensures future-proofing and better support for features like tool calls and their results. For details on the new API and migration steps, see the ChatMessage documentation. If you have any questions about this refactoring, feel free to let us know in this Github discussion.

⬆️ Upgrade Notes

The refactoring of the ChatMessage data class includes some breaking changes involving ChatMessage creation and accessing attributes. If you have a Pipeline containing a ChatPromptBuilder, serialized with haystack-ai =< 2.9.0, deserialization may break. For detailed information about the changes and how to migrate, see the ChatMessage documentation.
Removed the deprecated converter init argument from PyPDFToDocument. Use other init arguments instead, or create a custom component.
The SentenceWindowRetriever output key context_documents now outputs a List[Document] containing the retrieved documents and the context windows ordered by split_idx_start.
Update default value of store_full_path to False in converters

🚀 New Features

Introduced the ComponentTool, a new tool that wraps Haystack components, allowing them to be utilized as tools for LLMs (various ChatGenerators). This ComponentTool supports automatic tool schema generation, input type conversion, and offers support for components with run methods that have input types:

Basic types (str, int, float, bool, dict)
Dataclasses (both simple and nested structures)
Lists of basic types (e.g., List[str])
Lists of dataclasses (e.g., List[Document])
Parameters with mixed types (e.g., List[Document], str etc.)

Example usage:

from haystack import component, Pipeline
from haystack.tools import ComponentTool
from haystack.components.websearch import SerperDevWebSearch
from haystack.utils import Secret
from haystack.components.tools.tool_invoker import ToolInvoker
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

# Create a SerperDev search component
search = SerperDevWebSearch(api_key=Secret.from_env_var("SERPERDEV_API_KEY"), top_k=3)

# Create a tool from the component
tool = ComponentTool(
    component=search,
    name="web_search",  # Optional: defaults to "serper_dev_web_search"
    description="Search the web for current information on any topic"  # Optional: defaults to component docstring
)

# Create pipeline with OpenAIChatGenerator and ToolInvoker
pipeline = Pipeline()
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini", tools=[tool]))
pipeline.add_component("tool_invoker", ToolInvoker(tools=[tool]))

# Connect components
pipeline.connect("llm.replies", "tool_invoker.messages")

message = ChatMessage.from_user("Use the web search tool to find information about Nikola Tesla")

# Run pipeline
result = pipeline.run({"llm": {"messages": [message]}})

print(result)

Add XLSXToDocument converter that loads an Excel file using Pandas + openpyxl and by default converts each sheet into a separate Document in CSV format.
Added a new store_full_path parameter to the __init__ methods of PyPDFToDocument and AzureOCRDocumentConverter. The default value is True, which stores the full file path in the metadata of the output documents. When set to False, only the file name is stored.
Add a new experimental component ToolInvoker. This component invokes tools based on tool calls prepared by Language Models and returns the results as a list of ChatMessage objects with tool role.
Adding a RecursiveSplitter, which uses a set of separators to split text recursively. It attempts to divide the text using the first separator, and if the resulting chunks are still larger than the specified size, it moves to the next separator in the list.
Added a create_tool_from_function function to create a Too instance from a function, with automatic generation of name, description and parameters. Added a tool decorator to achieve the same result.
Add support for Tools in the Hugging Face API Chat Generator.
Changed the ChatMessage dataclass to support different types of content, including tool calls, and tool call results.
Add support for Tools in the OpenAI Chat Generator.
Added a new Tool dataclass to represent a tool for which Language Models can prepare calls.
Added the component StringJoiner to join strings from different components to a list of strings.

⚡️ Enhancement Notes

Added default_headers parameter to AzureOpenAIDocumentEmbedder and AzureOpenAITextEmbedder.
Add token argument to NamedEntityExtractor to allow usage of private Hugging Face models.
Add the from_openai_dict_format class method to the ChatMessage class. It allows you to create a ChatMessage from a dictionary in the format that OpenAI's Chat API expects.
Add a testing job to check that all packages can be imported successfully. This should help detect several issues, such as forgetting to use a forward reference for a type hint coming from a lazy import.
DocumentJoiner methods _concatenate() and _distribution_based_rank_fusion() were converted to static methods.
Improve serialization and deserialization of callables. We now allow serialization of class methods and static methods and explicitly prohibit serialization of instance methods, lambdas, and nested functions.
Added new initialization parameters to the PyPDFToDocument component to customize the text extraction process from PDF files.
Reorganized the document store test suite to isolate dataframe filter tests. This change prepares for potential future deprecation of the Document class's dataframe field.
Move Tool to a new dedicated tools package. Refactor Tool serialization and deserialization to make it more flexible and include type information.
The NLTKDocumentSplitter was merged into the DocumentSplitter which now provides the same functionality as the NLTKDocumentSplitter. The split_by="sentence" now uses a custom sentence boundary detection based on the nltk library. The previous sentence behaviour can still be achieved by split_by="period".
Improved deserialization of callables by using importlib instead of sys.modules. This change allows importing local functions and classes that are not in sys.modules when deserializing callable.
Change OpenAIDocumentEmbedder to keep running if a batch fails embedding. Now OpenAI returns an error we log that error and keep processing following batc...

Assets 2

13 Jan 15:07

github-actions

v2.9.0-rc1

fef35dd

v2.9.0-rc1 Pre-release

Pre-release

⭐️ Highlights

Tool Calling Support

def dummy_weather_function(city: str):
    return f"The weather in {city} is 20 degrees."

tool = Tool(
    name="weather_tool",
    description="A tool to get the weather",
    function=dummy_weather_function,
    parameters={
      "type": "object",
      "properties": {"city": {"type": "string"}},
      "required": ["city"],
    }
)

pipeline = Pipeline()
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini", tools=[tool]))
pipeline.add_component("tool_invoker", ToolInvoker(tools=[tool]))
pipeline.connect("llm.replies", "tool_invoker.messages")

message = ChatMessage.from_user("How is the weather in Berlin today?")
result = pipeline.run({"llm": {"messages": [message]}})

# Create a tool from the component
tool = ComponentTool(
    component=SerperDevWebSearch(api_key=Secret.from_env_var("SERPERDEV_API_KEY"), top_k=3),
    name="web_search",  # Optional: defaults to "serper_dev_web_search"
    description="Search the web for current information on any topic"  # Optional: defaults to component docstring
)

New Splitting Method: `RecursiveDocumentSplitter`

from haystack.components.preprocessors import RecursiveDocumentSplitter

splitter = RecursiveDocumentSplitter(split_length=260, split_overlap=0, separators=["\n\n", "\n", ".", " "])
doc_chunks = splitter.run([Document(content="...")])

⚠️ Refactored `ChatMessage` dataclass

⬆️ Upgrade Notes

The refactoring of the ChatMessage data class includes some breaking changes involving ChatMessage creation and accessing attributes. If you have a Pipeline containing a ChatPromptBuilder, serialized with haystack-ai =< 2.9.0, deserialization may break. For detailed information about the changes and how to migrate, see the ChatMessage documentation.
Remove deprecated 'converter' init argument from PyPDFToDocument. Use other init arguments instead, or create a custom component.
The SentenceWindowRetriever output key context_documents now outputs a List[Document] containing the retrieved documents and the context windows ordered by split_idx_start.
Update default value of store_full_path to False in converters
Remove 'is_greedy' deprecated argument from @component decorator. Change the Variadic input of your Component to GreedyVariadic instead.

🚀 New Features

Basic types (str, int, float, bool, dict)
Dataclasses (both simple and nested structures)
Lists of basic types (e.g., List[str])
Lists of dataclasses (e.g., List[Document])
Parameters with mixed types (e.g., List[Document], str etc.)

Example usage:

from haystack import component, Pipeline
from haystack.tools import ComponentTool
from haystack.components.websearch import SerperDevWebSearch
from haystack.utils import Secret
from haystack.components.tools.tool_invoker import ToolInvoker
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

# Create a SerperDev search component
search = SerperDevWebSearch(api_key=Secret.from_env_var("SERPERDEV_API_KEY"), top_k=3)

# Create a tool from the component
tool = ComponentTool(
    component=search,
    name="web_search",  # Optional: defaults to "serper_dev_web_search"
    description="Search the web for current information on any topic"  # Optional: defaults to component docstring
)

# Create pipeline with OpenAIChatGenerator and ToolInvoker
pipeline = Pipeline()
pipeline.add_component("llm", OpenAIChatGenerator(model="gpt-4o-mini", tools=[tool]))
pipeline.add_component("tool_invoker", ToolInvoker(tools=[tool]))

# Connect components
pipeline.connect("llm.replies", "tool_invoker.messages")

message = ChatMessage.from_user("Use the web search tool to find information about Nikola Tesla")

# Run pipeline
result = pipeline.run({"llm": {"messages": [message]}})

print(result)

Add XLSXToDocument converter that loads an Excel file using Pandas + openpyxl and by default converts each sheet into a separate Document in CSV format.
Added a new store_full_path parameter to the __init__ methods of PyPDFToDocument and AzureOCRDocumentConverter. The default value is True, which stores the full file path in the metadata of the output documents. When set to False, only the file name is stored.
Add a new experimental component ToolInvoker. This component invokes tools based on tool calls prepared by Language Models and returns the results as a list of ChatMessage objects with tool role.
Adding a RecursiveSplitter, which uses a set of separators to split text recursively. It attempts to divide the text using the first separator, and if the resulting chunks are still larger than the specified size, it moves to the next separator in the list.
Added a create_tool_from_function function to create a Too instance from a function, with automatic generation of name, description and parameters. Added a tool decorator to achieve the same result.
Add support for Tools in the Hugging Face API Chat Generator.
Changed the ChatMessage dataclass to support different types of content, including tool calls, and tool call results.
Add support for Tools in the OpenAI Chat Generator.
Added a new Tool dataclass to represent a tool for which Language Models can prepare calls.
Add warning logs to the PDFMinerToDocument and PyPDFToDocument to indicate when a processed PDF file has no content. This can happen if the PDF file is a scanned image. Also added an explicit check and warning message to the DocumentSplitter that warns the user that empty Documents are skipped. This behavior was already occurring, but now its clearer through logs that this is happening.
We have added a new MetaFieldGroupingRanker component that reorders documents by grouping them based on metadata keys. This can be useful for pre-processing Documents before feeding them to an LLM.
Added a new store_full_path parameter to the __init__ methods of JSONConverter, MarkdownToDocument, PDFMinerToDocument, PPTXToDocument, TikaDocumentConverter and TextFileToDocument. The default value is True, which stores full file path in the metadata of the output documents. When set to False, only the file name is stored.
Added a new store_full_path parameter to the __init__ method of CSVToDocument, DOCXToDocument, and HTMLToDocument. The default value is True, which stores full file path in the metadata of the output documents. When set to False, only the file name is stored.
Added component StringJoiner to join strings from different components to a list of strings.
When making function calls via OpenAPI, allow both switching SSL verification off and specifying a certificate authority to use for it.
Add Time-to-First-Token (TTFT) support for OpenAI generators. This captures the time taken to generate the first token from the model and can be used to analyze the latency of the application.
Added a new option to the required_variables parameter to the PromptBuilder and ChatPromptBuilder. By passing required_variables="*" you can automatically set all variables in the prompt to be required.

⚡️ Enhancement Notes

Added default_headers parameter to AzureOpenAIDocumentEmbedder and AzureOpenAITextEmbedder.
Add token argument to NamedEntityExtractor to allow usage of private Hugging Face models.
Across Haystack codebase, we have replaced the use of ChatMessage dataclass constructor with specific class met...

Assets 2

10 Jan 11:53

github-actions

v2.8.1

bc0cd95

v2.8.1

Release Notes

v2.8.1

Bug Fixes

Pin OpenAI client to >=1.56.1 to avoid issues related to changes in the httpx library.
PyPDFToDocument now creates documents with id based on converted text and meta data. Before it didn't take the meta data into account.
Fixes issues with deserialization of components in multi-threaded environments.

Assets 2

09 Jan 22:00

github-actions

v2.8.1-rc3

32a2044

v2.8.1-rc3 Pre-release

Pre-release

Release Notes

v2.8.1-rc3

Bug Fixes

PyPDFToDocument now creates documents with id based on converted text and meta data. Before it didn't take the meta data into account.

v2.8.1-rc2

Bug Fixes

Fixes issues with deserialization of components in multi-threaded environments.

v2.8.1-rc1

Bug Fixes

Pin OpenAI client to >=1.56.1 to avoid issues related to changes in the httpx library.

Assets 2

18 Dec 21:18

github-actions

v2.8.1-rc2

167a48e

v2.8.1-rc2 Pre-release

Pre-release

Release Notes

v2.8.1-rc2

Bug Fixes

Fixes issues with deserialization of components in multi-threaded environments.

v2.8.1-rc1

Bug Fixes

Pin OpenAI client to >=1.56.1 to avoid issues related to changes in the httpx library.

Assets 2

18 Dec 15:04

github-actions

v2.8.1-rc1

55e7fbf

v2.8.1-rc1 Pre-release

Pre-release

Release Notes

v2.8.1-rc1

Bug Fixes

Pin OpenAI client to >=1.56.1 to avoid issues related to changes in the httpx library.

Assets 2

05 Dec 11:26

github-actions

v2.8.0

505c127

v2.8.0

Release Notes

⬆️ Upgrade Notes

Remove is_greedy deprecated argument from @component decorator. Change the Variadic input of your Component to GreedyVariadic instead.

🚀 New Features

We've added a new DALLEImageGenerator component, bringing image generation with OpenAI's DALL-E to the Haystack

Easy to Use: Just a few lines of code to get started:

from haystack.components.generators import DALLEImageGenerator 

image_generator = DALLEImageGenerator() 
response = image_generator.run("Show me a picture of a black cat.") 
print(response)

Add warning logs to the PDFMinerToDocument and PyPDFToDocument to indicate when a processed PDF file has no content. This can happen if the PDF file is a scanned image. Also added an explicit check and warning message to the DocumentSplitter that warns the user that empty Documents are skipped. This behavior was already occurring, but now its clearer through logs that this is happening.
We have added a new MetaFieldGroupingRanker component that reorders documents by grouping them based on metadata keys. This can be useful for pre-processing Documents before feeding them to an LLM.
Added a new store_full_path parameter to the __init__ methods of the following converters:
JSONConverter, CSVToDocument, DOCXToDocument, HTMLToDocument MarkdownToDocument, PDFMinerToDocument, PPTXToDocument, TikaDocumentConverter, PyPDFToDocument , AzureOCRDocumentConverter and TextFileToDocument. The default value is True, which stores full file path in the metadata of the output documents. When set to False, only the file name is stored.
When making function calls via OpenAPI, allow both switching SSL verification off and specifying a certificate authority to use for it.
Add TTFT (Time-to-First-Token) support for OpenAI generators. This captures the time taken to generate the first token from the model and can be used to analyze the latency of the application.
Added a new option to the required_variables parameter to the PromptBuilder and ChatPromptBuilder. By passing required_variables="*" you can automatically set all variables in the prompt to be required.

⚡️ Enhancement Notes

Across Haystack codebase, we have replaced the use of ChatMessage data class constructor with specific class methods (ChatMessage.from_user, ChatMessage.from_assistant, etc.).
Added the Maximum Margin Relevance (MMR) strategy to the SentenceTransformersDiversityRanker. MMR scores are calculated for each document based on their relevance to the query and diversity from already selected documents.
Introduces optional parameters in the ConditionalRouter component, enabling default/fallback routing behavior when certain inputs are not provided at runtime. This enhancement allows for more flexible pipeline configurations with graceful handling of missing parameters.
Added split by line to DocumentSplitter, which will split the document at n.
Change OpenAIDocumentEmbedder to keep running if a batch fails embedding. Now OpenAI returns an error we log that error and keep processing following batches.
Added new initialization parameters to the PyPDFToDocument component to customize the text extraction process from PDF files.
Replace usage of ChatMessage.content with ChatMessage.text across the codebase. This is done in preparation for the removal of content in Haystack 2.9.0.

⚠️ Deprecation Notes

The default value of the store_full_path parameter in converters will change to False in Haysatck 2.9.0 to enhance privacy.
In Haystack 2.9.0, the ChatMessage data class will be refactored to make it more flexible and future-proof. As part of this change, the content attribute will be removed. A new text property has been introduced to provide access to the textual value of the ChatMessage. To ensure a smooth transition, start using the text property now in place of content.
The converter parameter in the PyPDFToDocument component is deprecated and will be removed in Haystack 2.9.0. For in-depth customization of the conversion process, consider implementing a custom component. Additional high-level customization options will be added in the future.
The output of context_documents in SentenceWindowRetriever will change in the next release. Instead of a List[List[Document]], the output will be a List[Document], where the documents are ordered by split_idx_start.

🐛 Bug Fixes

Fix DocumentCleaner not preserving all Document fields when run
Fix DocumentJoiner failing when ran with an empty list of Documents
For the NLTKDocumentSplitter we are updating how chunks are made when splitting by word and sentence boundary is respected. Namely, to avoid fully subsuming the previous chunk into the next one, we ignore the first sentence from that chunk when calculating sentence overlap. i.e. we want to avoid cases of Doc1 = [s1, s2], Doc2 = [s1, s2, s3].
Finished adding function support for this component by updating the _split_into_units function and added the splitting_function init parameter.
Add specific to_dict method to overwrite the underlying one from DocumentSplitter. This is needed to properly save the settings of the component to yaml.
Fix OpenAIChatGenerator and OpenAIGenerator crashing when using a streaming_callback and generation_kwargs contain {"stream_options": {"include_usage": True}}.
Fix tracing Pipeline with cycles to correctly track components execution
When meta is passed into AnswerBuilder.run(), it is now merged into GeneratedAnswer meta
Fix DocumentSplitter to handle custom splitting_function without requiring split_length. Previously the splitting_function provided would not override other settings.

Assets 2

Releases: deepset-ai/haystack

v2.10.0

⭐️ Highlights

Improved Pipeline.run() Logic

AsyncPipeline for Async Execution

Source Codes

Tool Calling Support Everywhere

Visualize Your Pipelines Locally

New Components for Smarter Document Processing

⬆️ Upgrade Notes

🚀 New Features

⚡️ Enhancement Notes

v2.10.0-rc3

v2.10.0-rc1

v2.9.0

⭐️ Highlights

Tool Calling Support

New Splitting Method: RecursiveDocumentSplitter

⚠️ Refactored ChatMessage dataclass

⬆️ Upgrade Notes

🚀 New Features

⚡️ Enhancement Notes

v2.9.0-rc1

⭐️ Highlights

Tool Calling Support

New Splitting Method: RecursiveDocumentSplitter

⚠️ Refactored ChatMessage dataclass

⬆️ Upgrade Notes

🚀 New Features

⚡️ Enhancement Notes

v2.8.1

Release Notes

v2.8.1

Bug Fixes

v2.8.1-rc3

Release Notes

v2.8.1-rc3

Bug Fixes

v2.8.1-rc2

Bug Fixes

v2.8.1-rc1

Bug Fixes

v2.8.1-rc2

Release Notes

v2.8.1-rc2

Bug Fixes

v2.8.1-rc1

Bug Fixes

v2.8.1-rc1

Release Notes

v2.8.1-rc1

Bug Fixes

v2.8.0

Release Notes

⬆️ Upgrade Notes

🚀 New Features

⚡️ Enhancement Notes

⚠️ Deprecation Notes

🐛 Bug Fixes

Improved `Pipeline.run()` Logic

`AsyncPipeline` for Async Execution

New Splitting Method: `RecursiveDocumentSplitter`

⚠️ Refactored `ChatMessage` dataclass

New Splitting Method: `RecursiveDocumentSplitter`

⚠️ Refactored `ChatMessage` dataclass