-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-provided schema as to_gbq parameter #44
Comments
not sure what you mean. How is this painful and unnecessary? what exactly are you proposing? |
to_gbq would include
If this schema is provided as an argument, then to_gbq would not do What I mean by painful, unnecessary dtype conversion is that if you e.g. have a DataFrame of Did you get my idea? Please see the code in the branch I linked to see how this would be done with a minimal effort. |
again not sure why this is necessary. we already know the types and simply need to map them. can you show an example where you would not want to do this? |
An option to provide the schema sounds reasonable - e.g. sometimes you want to specify |
Let's take a practical example: a CSV file that would be read via Or it would be a integer or string field (1,0 or true,false) which would be a boolean field in the destination table. Some additional casting operations would be required here again. Or bytes. Or date and time. Alternative to this would be to At least I'm working with integrations which have a solid contract about their output's types, and which conform to some standard. Now it requires some unnecessary boilerplate in a situation where we the transform part of the ETL equation could be dropped altogether in this kind of use cases. I could continue with a pythonic argument that explicit (providing the schema) is better than implicit (inferring the schema) but I think the previous points are already pretty valid. This wouldn't risk breaking anything as the parameter would have a None default and a simple conditional for whether the argument is not None. Probably some try-catching to the connector to give some reasonable exception in case user's schema doesn't cast the data right in Google Cloud end. |
@mremes I don't have a problem with providing a scheme, this just puts the burden on BQ conversions, rather than on the pandas side. So as an optional argument this would be ok. If its provided the df is passed as is (no conversions done), so the burden moves entirely to the caller. |
Just wanted to add a note of support for this feature. I'm having a hard time using |
Wishing this existed at the moment because of the DATE type issue. |
With a PR... your wish can be answered! It's Aladdin's genie, but you get as many wishes as you want |
PR’s been ready for months... #46 |
OK, sorry for both the delay and the ignorance of the response above. I will ping the maintainer to try and get this merged (it needs a rebase first though) |
The author seems to have deleted his account. Unless he responds, would anyone like to take this up? I will commit to quarterbacking this through if someone can make the required changes |
I haven’t made a public pull request before but I can give it a shot.
…On Tue, Jan 23, 2018 at 6:35 PM Maximilian Roos ***@***.***> wrote:
The author seems to have deleted his account. Unless he responds, would
anyone like to take this up?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AVxHjCYS_zDK-NgY_XT_NfAw6Lk-Rmh6ks5tNpb4gaJpZM4NrgW2>
.
|
gbq.to_gbq
function currently does the schema conversion based on a given DataFrame'sdtypes
attribute, based on the dtype -> BQ data type map. This reflection is then passed as a fields value in BQ API call.I'd like to propose that it should be possible to include a schema as an argument in the
to_gbq
call. This would save the users from painful, unnecessary dtype conversion as BQ API does the same thing again on create operation, and is not provided a schema on append loadAll operation.@jreback what do you think? I have started with the implementation in my fork's feature branch.
The text was updated successfully, but these errors were encountered: