Releases: deepset-ai/haystack
v1.15.1
v1.15.1-rc1
v1.15.1-rc1
v1.15.0
⭐ Highlights
Build Agents Yourself with Open Source
Exciting news! Say hello to LLM-based Agents, the new decision makers for your NLP applications! These agents have the power to answer complex questions by creating a dynamic action plan and using a variety of Tools in a loop. Picture this: your Agent decides to tackle a multi-hop question by retrieving pieces of information through a web search engine again and again. That's just one of the many feats these Agents can accomplish. Excited about the recent ChatGPT plugins? Agents allow you to build similar experiences in an open source way: your own environment, full control and transparency.
But how do you get started? First, wrap your Haystack Pipeline in a Tool and give your Agent a description of what that Tool can do. Then, initialize your Agent with a list of Tools and a PromptNode that decides when to use each Tool.
web_qa_tool = Tool(
name="Search",
pipeline_or_node=WebQAPipeline(retriever=web_retriever, prompt_node=web_qa_pn),
description="useful for when you need to Google questions.",
output_variable="results",
)
agent = Agent(
prompt_node=agent_pn,
prompt_template=prompt_template,
tools=[web_qa_tool],
final_answer_pattern=r"Final Answer\s*:\s*(.*)",
)
agent.run(query="<Your question here!>")
Check out the full example, a stand-alone WebQAPipeline, our new tutorials and the documentation!
Flexible PromptTemplates
Get ready to take your Pipelines to the next level with the revamped PromptNode. Now you have more flexibility when it comes to shaping the PromptNode outputs and inputs to work seamlessly with other nodes. But wait, there's more! You can now apply functions right within prompt_text. Want to concatenate the content of input documents? No problem! It's all possible with the PromptNode. And that's not all! The output_parser converts output into Haystack Document, Answer, or Label formats. Check out the AnswerParser in action, fully loaded and ready to use:
PromptTemplate(
name="question-answering",
prompt_text="Given the context please answer the question.\n"
"Context: {join(documents)}\n"
"Question: {query}\n"
"Answer: ",
output_parser=AnswerParser(),
)
More details here.
Using ChatGPT through PromptModel
A few lines of code are all you need to start chatting with ChatGPT through Haystack! The simple message format distinguishes instructions, user questions, and assistant responses. And with the chat functionality you can ask follow-up questions as in this example:
prompt_model = PromptModel("gpt-3.5-turbo", api_key=api_key)
prompt_node = PromptNode(prompt_model)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"},
]
result = prompt_node(messages)
Haystack Extras
We now have another repo haystack-extras with extra Haystack components, like audio nodes AnswerToSpeech and DocumentToSpeech. For example, these two can be installed via:
pip install farm-haystack-text2speech
What's Changed
Breaking Changes
- feat!: Increase Crawler standardization regarding Pipelines by @danielbichuetti in #4122
- feat: Enable PDFToTextConverter multiprocessing, increase general performance and simplify installation by @danielbichuetti in #4226
- build: Use
uvicorn
instead ofgunicorn
as server in REST API's Dockerfile by @bogdankostic in #4304 - chore!: remove deprecated OpenDistroElasticsearchDocumentStore by @masci in #4361
- refactor: Remove AnswerToSpeech and DocumentToSpeech nodes by @silvanocerza in #4391
- fix: Fix debug on PromptNode by @recrudesce in #4483
- feat: PromptTemplate extensions by @tstadel in #4378
Pipeline
- feat: Add JsonConverter node by @bglearning in #4130
- fix: Shaper store all outputs from function by @sjrl in #4223
- refactor: Isolate PDF OCR converter from PDF text converter by @danielbichuetti in #4193
- fix: add option to not override results by
Shaper
by @tstadel in #4231 - feat: reduce and focus telemetry by @ZanSara in #4087
- refactor: Remove deprecated nodes
EvalDocuments
andEvalAnswers
by @anakin87 in #4194 - refact: mark unit tests under the
test/nodes/**
path by @masci in #4235 - fix: FARMReader produces Answers with negative start and end position by @julian-risch in #4248
- test: replace
ElasticsearchDS
withInMemoryDS
when it makes sense; supportscale_score
inInMemoryDS
by @anakin87 in #4283 - test: mock all
Translator
tests and move one toe2e
by @ZanSara in #4290 - fix: Prevent going past token limit in OpenAI calls in PromptNode by @sjrl in #4179
- feat: Add Azure OpenAI embeddings support by @danielbichuetti in #4332
- test: move tests on standard pipelines in
e2e/
by @ZanSara in #4309 - fix: EvalResult load migration by @tstadel in #4289
- feat: Report execution time for pipeline components in
_debug
by @zoltan-fedor in #4197 - refactor: Use TableQuestionAnsweringPipeline from transformers by @sjrl in #4303
- fix: hf-tiny-roberta model loading from disk and mypy errors by @mayankjobanputra in #4363
- docs:
TransformersImageToText
- inform about supported models, better exception handling by @anakin87 in #4310 - fix: check that
answer
is notNone
before accessing it intable.py
by @culms in #4376 - feat: add automatic OCR detection mechanism and improve performance by @danielbichuetti in #4329
- Add Whisper node by @vblagoje in #4335
- tests: Mark Crawler tests correctly by @silvanocerza in #4435
- test: Skip flaky test_multimodal_retriever_query by @silvanocerza in #4444
- fix: issue evaluation check for content type by @ju-gu in #4181
- feat: break retry loop for 401 unauthorized errors in promptnode by @FHardow in #4389
- refactor: Remove retry_with_exponential_backoff in favor of tenacity by @silvanocerza in #4460
- refactor: Remove ElasticsearchRetriever and ElasticsearchFilterOnlyRetriever by @silvanocerza in #4499
- refactor: Deprecate BaseKnowledgeGraph, GraphDBKnowledgeGraph, InMemoryKnowledgeGraph and Text2SparqlRetriever by @silvanocerza in #4500
- refactor: remove telemetry v1 by @ZanSara in #4496
- feat: expose prompts to Answer and EvaluationResult by @tstadel in #4341
- feat: Add agent tools by @vblagoje in #4437
- refactor: reduce telemetry events count by @ZanSara in #4501
DocumentStores
- fix:
OpenSearchDocumentStore.delete_index
doesn't raise by @tstadel in #4295 - fix: increase
MetaDocumentORM
value length inSQLDocumentStore
by @anakin87 in #4333 - fix: when using IVF* indexing, ensure the index is trained frist by @kaixuanliu in #4311
- refactor: Mark MilvusDocumentStore as deprecated by @silvanocerza in #4498
Documentation
- feat: add
top_k
toPromptNode
by @tstadel in #4159 - feat: Add Agent by @julian-risch in #4148
- ci: Automate OpenAPI specs upload to Readme.io by @silvanocerza in #4228
- ci: Refactor docs config and generation by @silvanocerza in #4280
- feat: Add Azure as OpenAI endpoint by @vblagoje in #4170
- refactor: Allow flexible document id generation by @danielbichuetti in https://github.com/deepset-a...
v1.15.0-rc2
v1.15.0-rc2
v1.15.0-rc1
v1.15.0-rc1
v1.14.0
⭐ Highlights
PromptNode enhancements
PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!
Shaper
We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.
IVF and Product Quantization support for OpenSearchDocumentStore
We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore
. You can train the IVF index by calling train_index
method (same as in FAISSDocumentStore
) or by setting ivf_train_size
when initializing OpenSearchDocumentStore
and take your search to the next level.
What's Changed
Breaking Changes
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
- feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
- feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
- build: cache nltk models into the docker image by @mayankjobanputra in #4118
- feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850
Pipeline
- feat: add frontmatter to meta in
MarkdownConverter
by @TuanaCelik in #3953 - fix: removing code block in
MarkdownConverter
by @TuanaCelik in #3960 - feat: Add page range support to PDF converters. by @danielbichuetti in #3965
- fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
- feat: add
Shaper
by @ZanSara in #3880 - fix: Event sending for
RayPipeline
crashing Haystack by @zoltan-fedor in #3971 - fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
- fix: make the crawler more robust on Windows by @anakin87 in #4049
- fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
- feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
- refactor: replace mutable default arguments by @julian-risch in #4070
- feat: Support multiple
RayPipelines
by @zoltan-fedor in #4078 - Remove double batching in retrieve_batch by @sjrl in #4014
- style: Update black by @silvanocerza in #4101
- fix: Fix
TableTextRetriever
for input consisting of tables only by @jackapbutler in #4048 - fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
- Docs: Fix code block formatting by @agnieszka-m in #4162
- refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
- fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
- feat: Add OpenAIError to retry mechanism by @sjrl in #4178
DocumentStores
- refactor: use weaviate client to build BM25 query by @hsm207 in #3939
- fix: fixed
InMemoryDocumentStore.get_embedding_count
to return correct number by @sjrl in #3980 - fix: Add inner query for mysql compatibility by @julian-risch in #4068
- feat: add support for custom headers by @hsm207 in #4040
- feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
- refactor:
InMemoryDocumentStore
- manage documents without embedding & fix mypy errors by @anakin87 in #4113 - refactor: complete the document stores test refactoring by @masci in #4125
- feat: include testing facilities into haystack package by @masci in #4182
Documentation
- Align with the docs install guide + correct lg by @agnieszka-m in #3950
- docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
- Docs: Update docstrings by @agnieszka-m in #4119
- docs: Update Annotation Tool README.md by @bogdankostic in #4123
- feat: Add model_kwargs option to PromptNode by @sjrl in #4151
- fix: Remove logging statement of setting ID manually in
Document
by @bogdankostic in #4129 - chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
- chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
- feat: Implement
run_batch
for PromptNode by @sjrl in #4072
Other Changes
- fix: add option to not override results by Shaper #4231
- fix: Shaper store all outputs from function #4223
- fix: allowing file-upload api to write files to disk #4221
- fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
- feat: add top_k to PromptNode #4159
- feat: Add JsonConverter node #4130
- feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
- fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
- fix: change model in distillation test by @ZanSara in #3944
- feat: Expose
output_variable
in PromptNode result, adjust unit tests by @vblagoje in #3892 - fix: Fix type in
FARMReader
'ssave_to_remote
by @bogdankostic in #3952 - refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
- ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
- fix: overwrite params with environment variables even if there are no params in the pipeline definition; make
mypy
ignore REST API tests by @anakin87 in #3930 - Docs: Update ImageToText docstrings by @agnieszka-m in #3963
- Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
- ci: Add Docker images testing by @silvanocerza in #3943
- feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
- ci: Fix docker image testing on release by @silvanocerza in #3976
- Fix: Fix quotation marks by @agnieszka-m in #3973
- fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
- chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
- Missing import for
TransformersImageToText
by @ZanSara in #3984 - test: CI on py3.8 by @ZanSara in #3926
- Simplifies and fix docker images tests on release by @silvanocerza in #3982
- feat: Add
use_prefiltering
parameter toDeepsetCloudDocumentStore
by @bogdankostic in #3969 - ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
- fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
- fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
- fix: extend schema for prompt node results by @tstadel in #3891
- proposal: TableCell by @sjrl in #3875
- refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
- ci: Automate release on PyPi by @silvanocerza in https://github.co...
v1.14.0rc2
What's Changed
- fix: add option to not override results by Shaper #4231
- fix: Shaper store all outputs from function #4223
- fix: allowing file-upload api to write files to disk #4221
- fix: Fix bug in prompt template check of OpenAIAnswerGenerator #4220
- feat: add top_k to PromptNode #4159
- feat: Add JsonConverter node #4130
v1.14.0rc1
⭐ Highlights
PromptNode enhancements
PromptNode just rolled out prompt logging (pipeline debug), run_batch, and model_kwargs support. More updates to PromptNode and PromptTemplates coming soon!
Shaper
We're introducing the Shaper, PromptNode's helper. Shaper unlocks the full potential of PromptNode and ensures its seamless integration with Haystack. But Shaper's scope and functionality are not limited to PromptNode; you can also use it independently, opening up a whole new world of possibilities.
IVF and Product Quantization support for OpenSearchDocumentStore
We've added support for IVF and IVF with Product Quantization to OpenSearchDocumentStore
. You can train the IVF index by calling train_index
method (same as in FAISSDocumentStore
) or by setting ivf_train_size
when initializing OpenSearchDocumentStore
and take your search to the next level.
What's Changed
Breaking Changes
- refactor: Updated rest_api schema for tables to be consistent with Document.to_dict by @sjrl in #3872
- feat: Support multiple document_ids in Answer object (for generative QA) by @tstadel in #4062
- feat: Update OpenAIAnswerGenerator defaults and with learnings from PromptNode by @sjrl in #4038
- build: cache nltk models into the docker image by @mayankjobanputra in #4118
- feat: Add IVF and Product Quantization support for OpenSearchDocumentStore by @bogdankostic in #3850
Pipeline
- feat: add frontmatter to meta in
MarkdownConverter
by @TuanaCelik in #3953 - fix: removing code block in
MarkdownConverter
by @TuanaCelik in #3960 - feat: Add page range support to PDF converters. by @danielbichuetti in #3965
- fix: Update telemetry to not serialize Pipeline if disabled. by @sjrl in #4000
- feat: add
Shaper
by @ZanSara in #3880 - fix: Event sending for
RayPipeline
crashing Haystack by @zoltan-fedor in #3971 - fix: document retrieval metrics for non-document_id document_relevance_criteria by @tstadel in #3885
- fix: make the crawler more robust on Windows by @anakin87 in #4049
- fix: use correct count of outgoing edges in RayPipeline by @zoltan-fedor in #4066
- feat: Allow all training options for training a SentenceTransformers EmbeddingRetriever by @sjrl in #4026
- refactor: replace mutable default arguments by @julian-risch in #4070
- feat: Support multiple
RayPipelines
by @zoltan-fedor in #4078 - Remove double batching in retrieve_batch by @sjrl in #4014
- style: Update black by @silvanocerza in #4101
- fix: Fix
TableTextRetriever
for input consisting of tables only by @jackapbutler in #4048 - fix: Deduplicate same Documents in isolated evaluation of Reader by @bogdankostic in #4114
- Docs: Fix code block formatting by @agnieszka-m in #4162
- refactor: Remove the pin from the espnet module and fix the audio node tests. by @danielbichuetti in #4128
- fix: change tiktoken fallback mechanism to support Windows amd64 by @danielbichuetti in #4175
- feat: Add OpenAIError to retry mechanism by @sjrl in #4178
DocumentStores
- refactor: use weaviate client to build BM25 query by @hsm207 in #3939
- fix: fixed
InMemoryDocumentStore.get_embedding_count
to return correct number by @sjrl in #3980 - fix: Add inner query for mysql compatibility by @julian-risch in #4068
- feat: add support for custom headers by @hsm207 in #4040
- feat: Add BM25 support for tables in InMemoryDocumentStore by @bogdankostic in #4090
- refactor:
InMemoryDocumentStore
- manage documents without embedding & fix mypy errors by @anakin87 in #4113 - refactor: complete the document stores test refactoring by @masci in #4125
- feat: include testing facilities into haystack package by @masci in #4182
Documentation
- Align with the docs install guide + correct lg by @agnieszka-m in #3950
- docs: Update Crawler docstring for correct usage in Google colab by @silvanocerza in #3979
- Docs: Update docstrings by @agnieszka-m in #4119
- docs: Update Annotation Tool README.md by @bogdankostic in #4123
- feat: Add model_kwargs option to PromptNode by @sjrl in #4151
- fix: Remove logging statement of setting ID manually in
Document
by @bogdankostic in #4129 - chore: Fixing PromptNode .prompt() docstring to include the PromptTemplate object as an option by @TuanaCelik in #4135
- chore: de-couple the telemetry events for each tutorial from the dataset on AWS that is used by @TuanaCelik in #4155
- Prompt node/run batch by @sjrl in #4072
Other Changes
- feat: adding secure loading of models by default for haystack by @mayankjobanputra in #3901
- fix: add tiktoken fallback mechanism. by @danielbichuetti in #3929
- fix: change model in distillation test by @ZanSara in #3944
- feat: Expose
output_variable
in PromptNode result, adjust unit tests by @vblagoje in #3892 - fix: Fix type in
FARMReader
'ssave_to_remote
by @bogdankostic in #3952 - refactor: Remove PromptNode hash and equality functions by @vblagoje in #3923
- ci: Remove mypy deps install step in python_cache action by @silvanocerza in #3956
- fix: overwrite params with environment variables even if there are no params in the pipeline definition; make
mypy
ignore REST API tests by @anakin87 in #3930 - Docs: Update ImageToText docstrings by @agnieszka-m in #3963
- Docs: Add TransformersImageToText API doc by @agnieszka-m in #3966
- ci: Add Docker images testing by @silvanocerza in #3943
- feat: Allow users to set a timeout for remote APIs by @danielbichuetti in #3949
- ci: Fix docker image testing on release by @silvanocerza in #3976
- Fix: Fix quotation marks by @agnieszka-m in #3973
- fix: PromptNode doesn't have run_batch support (yet) by @vblagoje in #3972
- chore: increased timeout for loading pipelines through API by @mayankjobanputra in #3977
- Missing import for
TransformersImageToText
by @ZanSara in #3984 - test: CI on py3.8 by @ZanSara in #3926
- Simplifies and fix docker images tests on release by @silvanocerza in #3982
- feat: Add
use_prefiltering
parameter toDeepsetCloudDocumentStore
by @bogdankostic in #3969 - ci: Delete Docker images after testing to prevent workflow failure by @silvanocerza in #4004
- fix: Add a verbose option to PromptNode to let users understand the prompts being used #2 by @zoltan-fedor in #3898
- fix: prevent posthog from sending errors to stderr by @julian-risch in #4008
- fix: extend schema for prompt node results by @tstadel in #3891
- proposal: TableCell by @sjrl in #3875
- refactor: In PromptNode reuse tokenizer instead of loading new one for stop words by @sjrl in #4016
- ci: Automate release on PyPi by @silvanocerza in #4015
- ci: Fix PyPi release workflow by @silvanocerza in #4029
- ci: Bump act10ns/slack from v1 to v2 by @silvanocerza in #4031
- ci: latest version of pylint is failing, ignore new errors by @masci in https://github.com/deep...
v1.13.2
What's Changed
Pipelines
- fix: fix torchaudio version by @mayankjobanputra in #4102
- feat: Add use_prefiltering parameter to DeepsetCloudDocumentStore by @bogdankostic in #3969
Documentation
- Add shaper api by @agnieszka-m in #4082
- Update imgtotext api by @agnieszka-m in #4074
Full Changelog: v1.13.1...v1.13.2
v1.13.1
What's Changed
- fix: document retrieval metrics for non-document_id document_relevance_criteria (#3885)
- Update pyproject.toml (#4035)
- feat: add
Shaper
(#3880) - fix: extend schema for prompt node results (#3891)
- fix: removing code block in
MarkdownConverter
(#3960) - feat: add frontmatter to meta in
MarkdownConverter
(#3953)
Full Changelog: v1.13.0...v1.13.1