refactor: Remove AnswerToSpeech and DocumentToSpeech nodes #4391

silvanocerza · 2023-03-13T11:10:31Z

Proposed Changes:

This PR completely removes AnswerToSpeech and DocumentToSpeech audio nodes and relative tests.

We decided to completely remove them directly as the nodes are not used as shown by our internal metrics.

This nodes will still be usable from the Haystack extras.

How did you test it?

I run tests locally.

Notes for the reviewer

Hold merging this until we publish the extra packages.

julian-risch · 2023-03-13T11:21:10Z

@agnieszka-m @silvanocerza What's the plan with the documentation? For example with this page: https://docs.haystack.deepset.ai/v1.15-unstable/docs/answer_to_speech
I'd say all the nodes in extras should get a link to the extras repo and a brief explanation that it's not part of Haystack core and how to install. Or will the docs be restructured with an extra section about extras?

ZanSara · 2023-03-13T11:23:54Z

@silvanocerza don't forget to remove Speech primitives:

https://github.com/deepset-ai/haystack/blob/main/haystack/schema.py#L283
https://github.com/deepset-ai/haystack/blob/main/haystack/schema.py#L481

and audio dependencies:

https://github.com/deepset-ai/haystack/blob/main/pyproject.toml#L149
https://github.com/deepset-ai/haystack/blob/main/pyproject.toml#L229
https://github.com/deepset-ai/haystack/blob/main/pyproject.toml#L233
https://github.com/deepset-ai/haystack/blob/main/pyproject.toml#L67

We also don't need libsndfile1 in CI anymore (like here https://github.com/deepset-ai/haystack/blob/main/.github/workflows/tests.yml#L461), however we do need ffmpeg for the Whisper node

vblagoje · 2023-03-13T12:26:07Z

Yes, good points @ZanSara I'll, in the meantime, make some minor changes to #4335 . Let's integrate this PR first and then #4335

vblagoje · 2023-03-14T09:33:02Z

pyproject.toml

-  "pydub",
-  "protobuf<=3.20.1",
-  "soundfile< 0.12.0",
-  "numpy<1.24",  # Keep compatibility with latest numba
  "openai-whisper"


Noice @silvanocerza

agnieszka-m · 2023-03-14T09:44:15Z

@julian-risch @silvanocerza so what's the plan for this "extras" repo? what's going to be there?

we definitely need some documentation around it.

But I assume the nodes moved to extras will still work as described in the current docs? just the installation is going to differ?
Maybe we should introduce Haystack core and Haystack extras in the haystack concepts section and then update the installation instructions.

Is this going to be in 1.15?

silvanocerza · 2023-03-14T13:39:37Z

But I assume the nodes moved to extras will still work as described in the current docs? just the installation is going to differ? Maybe we should introduce Haystack core and Haystack extras in the haystack concepts section and then update the installation instructions.

As of now they'll be only moved in the deepset-ai/haystack-extras repo and published as separate packages. Their logic will be left untouched but it's possible it might change in the future. 🤷

I like the idea of introducing the concepts of Haystack core and extras in the installation instructions.

Is this going to be in 1.15?

Yes, they'll be removed already in the next version.

julian-risch

Looks very good to me already, just a few tiny things that can also be removed:

AudioNodeError in haystack/errors.py is unused now and should be removed.
AnswerToSpeech and DocumentToSpeech safe_imports should be removed from haystack/nodes/init.py
Searching for "audio" in the code base gives ContentTypes = Literal["text", "table", "image", "audio"] in haystack/schema.py and also some results in Embedder. We should add a link to the extras repo maybe, because it relies on it, for example here but looking just at Haystack, it's hard to understand why we still have it.

julian-risch

LGTM! 👍

silvanocerza · 2023-03-15T15:31:40Z

Rebased to fix conflicts.

silvanocerza added the breaking change label Mar 13, 2023

silvanocerza self-assigned this Mar 13, 2023

silvanocerza requested a review from a team as a code owner March 13, 2023 11:10

silvanocerza requested review from julian-risch and removed request for a team March 13, 2023 11:10

github-actions bot added topic:audio topic:tests labels Mar 13, 2023

silvanocerza force-pushed the audio-nodes-removal branch from c2a11ab to f023d23 Compare March 13, 2023 17:31

github-actions bot added topic:build/distribution topic:CI topic:dependencies labels Mar 13, 2023

vblagoje reviewed Mar 14, 2023

View reviewed changes

agnieszka-m added the type:documentation Improvements on the docs label Mar 14, 2023

silvanocerza mentioned this pull request Mar 15, 2023

test: Fix audio tests failing #4418

Merged

julian-risch requested changes Mar 15, 2023

View reviewed changes

julian-risch approved these changes Mar 15, 2023

View reviewed changes

silvanocerza mentioned this pull request Mar 15, 2023

Audio is a supported content type but never used in the core codebase #4424

Closed

silvanocerza added 4 commits March 15, 2023 16:26

Remove AnswerToSpeech and DocumentToSpeech nodes

48b90ae

Remove unused dataclasses

09a286d

Remove unnecessary dependencies

265789f

Remove unused error class and imports

63daea8

silvanocerza force-pushed the audio-nodes-removal branch from e801fef to 63daea8 Compare March 15, 2023 15:31

silvanocerza merged commit b59cf76 into main Mar 15, 2023

silvanocerza deleted the audio-nodes-removal branch March 15, 2023 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Remove AnswerToSpeech and DocumentToSpeech nodes #4391

refactor: Remove AnswerToSpeech and DocumentToSpeech nodes #4391

silvanocerza commented Mar 13, 2023

julian-risch commented Mar 13, 2023

ZanSara commented Mar 13, 2023 •

edited

Loading

vblagoje commented Mar 13, 2023

vblagoje Mar 14, 2023

agnieszka-m commented Mar 14, 2023

silvanocerza commented Mar 14, 2023

julian-risch left a comment

julian-risch left a comment

silvanocerza commented Mar 15, 2023

refactor: Remove AnswerToSpeech and DocumentToSpeech nodes #4391

refactor: Remove AnswerToSpeech and DocumentToSpeech nodes #4391

Conversation

silvanocerza commented Mar 13, 2023

Proposed Changes:

How did you test it?

Notes for the reviewer

julian-risch commented Mar 13, 2023

ZanSara commented Mar 13, 2023 • edited Loading

vblagoje commented Mar 13, 2023

vblagoje Mar 14, 2023

Choose a reason for hiding this comment

agnieszka-m commented Mar 14, 2023

silvanocerza commented Mar 14, 2023

julian-risch left a comment

Choose a reason for hiding this comment

julian-risch left a comment

Choose a reason for hiding this comment

silvanocerza commented Mar 15, 2023

ZanSara commented Mar 13, 2023 •

edited

Loading