Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds audio querying to MultimodalQ&A gateway #974

Closed
wants to merge 15 commits into from

Conversation

mhbuehler
Copy link
Contributor

@mhbuehler mhbuehler commented Dec 4, 2024

Description

Adds ASR endpoint, speech audio processing, prompt construction, and return of decoded audio in response metadata. This goes with GenAIExamples PR: opea-project/GenAIExamples#1225.

Issues

Part of the MultimodalQnA Audio & Image Enhancements RFC

Type of change

  • New feature (non-breaking change which adds new functionality)

Dependencies

N/A

Tests

Automated tests were added to GenAIExamples.

Signed-off-by: okhleif-IL <[email protected]>

* added in audio dict creation

Signed-off-by: okhleif-IL <[email protected]>

* separated audio from prompt

Signed-off-by: okhleif-IL <[email protected]>

* added ASR endpoint

Signed-off-by: okhleif-IL <[email protected]>

* removed ASR endpoints from mm embedding

Signed-off-by: okhleif-IL <[email protected]>

* edited return logic, fixed function call

Signed-off-by: okhleif-IL <[email protected]>

* added megaservice to elif

Signed-off-by: okhleif-IL <[email protected]>

* reworked helper func

Signed-off-by: okhleif-IL <[email protected]>

* Append audio to prompt

Signed-off-by: okhleif-IL <[email protected]>

* Reworked handle messages, added metadata

Signed-off-by: okhleif-IL <[email protected]>

* Moved dictionary logic to right place

Signed-off-by: okhleif-IL <[email protected]>

* changed logic to rely on message len

Signed-off-by: okhleif-IL <[email protected]>

* list --> empty str

Signed-off-by: okhleif-IL <[email protected]>
---------

Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: dmsuehir <[email protected]>
okhleif-IL and others added 3 commits December 4, 2024 15:14
Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 25 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/mega/gateway.py 66.66% 25 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/mega/gateway.py 31.43% <66.66%> (+3.29%) ⬆️

@ashahba ashahba added this to the v1.2 milestone Dec 4, 2024
Copy link
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ashahba ashahba removed the WIP label Dec 6, 2024
Copy link
Collaborator

@mkbhanda mkbhanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

else:
return prompt

def convert_audio_to_text(self, audio):
# translate audio to text by passing in dictionary to ASR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment quirky! dictionary is a data type here but can get mixed with the English word dictionary (word meanings)

else:
input_dict = {"byte_str": audio[0]}

response = requests.post(self.asr_endpoint, data=json.dumps(input_dict), proxies={"http": None})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should proxies be read from some environment variable for a more general solution?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is setting proxies in the first place, shouldn't those be set well before this point?

import unittest
from typing import Union

import requests
from fastapi import Request

os.environ["ASR_SERVICE_PORT"] = "8086"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this overrides environment, instead of taking the value from environment?

@ashahba
Copy link
Collaborator

ashahba commented Dec 10, 2024

Labeling this as WIP since we may not need it after all.

@ashahba
Copy link
Collaborator

ashahba commented Dec 10, 2024

Looking at new changes in GenAIExamples, it seems like we don't this PR after all since this PR opea-project/GenAIExamples#1225 is self contained.

@ashahba ashahba closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants