Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The flight.NewRecordWriter parameter is ambiguous #43443

Closed
mac-zhenfang opened this issue Jul 27, 2024 · 3 comments · Fixed by #43484
Closed

The flight.NewRecordWriter parameter is ambiguous #43443

mac-zhenfang opened this issue Jul 27, 2024 · 3 comments · Fixed by #43484
Assignees
Labels
Component: Go Type: usage Issue is a user question
Milestone

Comments

@mac-zhenfang
Copy link

Describe the usage question you have. Please include as many useful details as possible.

The func NewRecordWriter(w DataStreamWriter, opts ...ipc.Option) *Writer API indicates the DataStreamWriter is a required parameter, all the others are optional. But in the

func (w *Writer) start() error {
w.started = true

w.mapper.ImportSchema(w.schema)
w.lastWrittenDicts = make(map[int64]arrow.Array)

// write out schema payloads
ps := payloadFromSchema(w.schema, w.mem, &w.mapper)
defer ps.Release()

for _, data := range ps {
	err := w.pw.WritePayload(data)
	if err != nil {
		return err
	}
}

return nil

}

The w.schema looks a required parameter. If it s nil, will report arrow/ipc: unknown error while writing: runtime error: invalid memory address or nil pointer dereference error.

The request is to use the Schema in RecordBatch, instead of a input option

Component(s)

Go

@mac-zhenfang mac-zhenfang added the Type: usage Issue is a user question label Jul 27, 2024
@joellubi
Copy link
Member

Thanks for opening this. It indeed is the case that the schema is logically a required parameter; the IPC stream must have a consistent schema.

I agree that having it as an optional parameter may be misleading. The reason for this is because it passes through the general ipc.Option's that are shared across the different IPC readers/writers. Often the schema is optional because it can be inferred otherwise, such as from the footer of an IPC file in the case of reading. In that specific case the schema is not required, but if it is provided a check is performed to ensure it matches the footer schema.

I think a similar approach could make sense here in the case of the flight.Writer:

  • If no schema is provided in the constructor, set it for the whole stream based off the first record written. All subsequent records must match this schema.
  • If a schema is provided, check the first record's schema matches and proceed as before.

@zeroshade Do these semantics seem reasonable?

@zeroshade
Copy link
Member

@joellubi Yup, those are definitely reasonable. and a good idea for us to make the change.

@joellubi
Copy link
Member

Issue resolved by pull request 43484
#43484

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Go Type: usage Issue is a user question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants