Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio query functionality to multimodal backend #8

Merged
merged 102 commits into from
Dec 2, 2024

Conversation

okhleif-IL
Copy link
Collaborator

@okhleif-IL okhleif-IL commented Nov 4, 2024

Description

This PR enables audio as a query in the backend, where the expected input to the gateway is a b64 string.

Issues

Part of RFC --> https://github.com/opea-project/docs/blob/main/community/rfcs/24-10-02-GenAIExamples-001-Image_and_Audio_Support_in_MultimodalQnA.md

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

N/A

Tests

Verify with curl command:

curl http://${host_ip}:8888/v1/multimodalqna -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA"}]}]}'

curl http://${host_ip}:8888/v1/multimodalqna -H "Content-Type: application/json" -d '{"messages": [{"role": "user", "content": [{"type": "audio", "audio": "UklGRigAAABXQVZFZm10IBIAAAABAAEARKwAAIhYAQACABAAAABkYXRhAgAAAAEA, "}]}, {"role": "assistant", "content": "How may I help?"}, {"role": "user", "content": [{"type": "text", "text": "what pronoun did I tell you?"}]}]}

Need to add automated test.

mhbuehler and others added 30 commits October 14, 2024 16:33
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
…hbuehler/GenAIComps into melanie/combined_image_video_ingestion
Signed-off-by: okhleif-IL <[email protected]>
* Add support for audio files multimodal data ingestion

Signed-off-by: dmsuehir <[email protected]>

* Update function name

Signed-off-by: dmsuehir <[email protected]>

---------

Signed-off-by: dmsuehir <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Copy link
Collaborator

@dmsuehir dmsuehir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@okhleif-IL okhleif-IL merged commit ae5437a into mmqna-phase2 Dec 2, 2024
dmsuehir added a commit that referenced this pull request Dec 16, 2024
* Backend enhancements for image query capabilities for MultimodalQnA

* Fix model name var

Signed-off-by: dmsuehir <[email protected]>

* Remove space at end of prompt

Signed-off-by: dmsuehir <[email protected]>

* Add env var for the max number of images sent to the LVM

Signed-off-by: dmsuehir <[email protected]>

* README update for the MAX_IMAGES env var

Signed-off-by: dmsuehir <[email protected]>

* Remove prints

Signed-off-by: dmsuehir <[email protected]>

* Audio query functionality to multimodal backend (#8)

Signed-off-by: okhleif-IL <[email protected]>

* added in audio dict creation

Signed-off-by: okhleif-IL <[email protected]>

* separated audio from prompt

Signed-off-by: okhleif-IL <[email protected]>

* added ASR endpoint

Signed-off-by: okhleif-IL <[email protected]>

* removed ASR endpoints from mm embedding

Signed-off-by: okhleif-IL <[email protected]>

* edited return logic, fixed function call

Signed-off-by: okhleif-IL <[email protected]>

* added megaservice to elif

Signed-off-by: okhleif-IL <[email protected]>

* reworked helper func

Signed-off-by: okhleif-IL <[email protected]>

* Append audio to prompt

Signed-off-by: okhleif-IL <[email protected]>

* Reworked handle messages, added metadata

Signed-off-by: okhleif-IL <[email protected]>

* Moved dictionary logic to right place

Signed-off-by: okhleif-IL <[email protected]>

* changed logic to rely on message len

Signed-off-by: okhleif-IL <[email protected]>

* list --> empty str

Signed-off-by: okhleif-IL <[email protected]>
---------

Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: dmsuehir <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixed role bug where i never was > 0

Signed-off-by: okhleif-IL <[email protected]>

* Fix after merge

Signed-off-by: dmsuehir <[email protected]>

* removed whitespace

Signed-off-by: okhleif-IL <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix call to get role labels

Signed-off-by: dmsuehir <[email protected]>

* Gateway test updates images within the conversation

Signed-off-by: dmsuehir <[email protected]>

* Adds unit test coverage for audio query

Signed-off-by: Melanie Buehler <[email protected]>

* Update test to check the returned b64 types

Signed-off-by: dmsuehir <[email protected]>

* Update test since we don't expect images from the assistant

Signed-off-by: dmsuehir <[email protected]>

* Port number fix

Signed-off-by: Melanie Buehler <[email protected]>

* Formatting

Signed-off-by: Melanie Buehler <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed place where port number is set

Signed-off-by: Melanie Buehler <[email protected]>

* Remove old comment and added more accurate description

Signed-off-by: dmsuehir <[email protected]>

* add comment in code about MAX_IMAGES

Signed-off-by: dmsuehir <[email protected]>

* Add Gaudi support for image query

Signed-off-by: dmsuehir <[email protected]>

* Fix to pass the retrieved image last

Signed-off-by: dmsuehir <[email protected]>

* Revert out gateway and gateway test code, due to its move to GenAIExamples

Signed-off-by: dmsuehir <[email protected]>

* Fix retriever test for checking for b64_img_str in the result

Signed-off-by: dmsuehir <[email protected]>

---------

Signed-off-by: dmsuehir <[email protected]>
Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Co-authored-by: Omar Khleif <[email protected]>
Co-authored-by: Melanie Hart Buehler <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants