Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

audio: add items to AudioResponseFormat enum #382

Merged
merged 2 commits into from
Jun 16, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 22 additions & 4 deletions audio.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@ const (
type AudioResponseFormat string

const (
AudioResponseFormatJSON AudioResponseFormat = "json"
AudioResponseFormatSRT AudioResponseFormat = "srt"
AudioResponseFormatVTT AudioResponseFormat = "vtt"
AudioResponseFormatJSON AudioResponseFormat = "json"
AudioResponseFormatText AudioResponseFormat = "text"
AudioResponseFormatSRT AudioResponseFormat = "srt"
AudioResponseFormatVerboseJSON AudioResponseFormat = "verbose_json"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romazu What format is verbose_json? If it is json, you may need to add it to the conditions of the function below.

// HasJSONResponse returns true if the response format is JSON.
func (r AudioRequest) HasJSONResponse() bool {
	return r.Format == "" || r.Format == AudioResponseFormatJSON
}

Refs:
https://github.com/sashabaranov/go-openai/blob/master/audio.go#L97-L100

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, it's a json in the following format:

{
  "task": "transcribe",
  "language": "english",
  "duration": 10.0,
  "segments": [
    {
      "id": 0,
      "seek": 0,
      "start": 0.0,
      "end": 10.16,
      "text": " Chapter 1 Looming",
      "tokens": [
        50364,
        18874,
        502,
        6130,
        10539,
        50872
      ],
      "temperature": 0.0,
      "avg_logprob": -0.6817335401262555,
      "compression_ratio": 0.68,
      "no_speech_prob": 0.01032273843884468,
      "transient": false
    }
  ],
  "text": "Chapter 1 Looming"
}

I also expanded AudioResponse struct to accommodate for this.

AudioResponseFormatVTT AudioResponseFormat = "vtt"
)

// AudioRequest represents a request structure for audio API.
Expand All @@ -44,6 +46,22 @@ type AudioRequest struct {

// AudioResponse represents a response structure for audio API.
type AudioResponse struct {
Task string `json:"task"`
Language string `json:"language"`
Duration float64 `json:"duration"`
Segments []struct {
ID int `json:"id"`
Seek int `json:"seek"`
Start float64 `json:"start"`
End float64 `json:"end"`
Text string `json:"text"`
Tokens []int `json:"tokens"`
Temperature float64 `json:"temperature"`
AvgLogprob float64 `json:"avg_logprob"`
CompressionRatio float64 `json:"compression_ratio"`
NoSpeechProb float64 `json:"no_speech_prob"`
Transient bool `json:"transient"`
} `json:"segments"`
Text string `json:"text"`
}

Expand Down Expand Up @@ -96,7 +114,7 @@ func (c *Client) callAudioAPI(

// HasJSONResponse returns true if the response format is JSON.
func (r AudioRequest) HasJSONResponse() bool {
return r.Format == "" || r.Format == AudioResponseFormatJSON
return r.Format == "" || r.Format == AudioResponseFormatJSON || r.Format == AudioResponseFormatVerboseJSON
}

// audioMultipartForm creates a form with audio file contents and the name of the model to use for
Expand Down