Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Basic Types in Binary Format #223

Merged
merged 26 commits into from
Aug 12, 2021
Merged

Query Basic Types in Binary Format #223

merged 26 commits into from
Aug 12, 2021

Conversation

samuel-massinon
Copy link

@samuel-massinon samuel-massinon commented Jun 29, 2021

TODO

  • Confirm this is a good way of doing this
  • More tests (for all the binary format functions)
  • Clean up
  • Compare metrics
    • Size of data
    • Speed of parsing
  • Document

This PR is focusing on using LibPQ to query integer values.

@samuel-massinon samuel-massinon changed the title Working solution to query in binary format Query Integers in Binary Format Jun 29, 2021
src/parsing.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/results.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
Can use binary data in all exectues
@samuel-massinon
Copy link
Author

Got around to testing this. Here's are some results.

Test Performed Execute & Download Speed (ns) Execute & Download Bytes Allocated Parse Speed (ns) Parse Bytes Allocated Bytes Transferred
String 1 7036557543 138313414 222047909 40757869 25556052
String 2 6928509582 138313414 228389655 40757869 25556052
String 3 6962369997 138313414 223927678 40757869 25556052
Binary 1 8206719558 138314518 224727759 40756541 29378777
Binary 2 7957701956 138314518 222612657 40756541 29378777
Binary 3 7957701956 138314518 222612657 40756541 29378777

In this test, using binary was slower to execute and download and transferred more bytes. All the other metrics are pretty indistinguishable.

This makes sense because transferring an integer in string format could save space by having variable length strings, where as transferring it as bytes causes use to transfer all the bytes to represent the number. (an example is the number 1 would be 1 or 2 bytes for a string, but could be 8 bytes as binary data.

@samuel-massinon
Copy link
Author

Is this ready to merge into master even though we've only implemented binary for 1 data type?

Should we hold off until we implement more? Should we add a warning if someone tries to use binary data? Add a warning in the docs? Or just have a generic catch all implementation that someone can override?

Copy link

@arnaudh arnaudh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I grasped (which is admittedly not much), the PR looks good to me.

Regarding whether to merge and how to signal to users, given that our plan is to only implement a subset of all types for now, we should be fine with merging this, but have clear warnings in place.
In the documentation and docstrings (saying binary format is an experimental feature, listing the types supported), but also the code itself. Ideally we should throw an exception before the query is executed if an unsupported type is requested using binary format, though it's probably not the end of the world if the exception happens after while parsing. As long as we error somewhere.

src/asyncresults.jl Outdated Show resolved Hide resolved
src/asyncresults.jl Outdated Show resolved Hide resolved
@samuel-massinon
Copy link
Author

samuel-massinon commented Jul 19, 2021

In the documentation and docstrings (saying binary format is an experimental feature, listing the types supported), but also the code itself. Ideally we should throw an exception before the query is executed if an unsupported type is requested using binary format, though it's probably not the end of the world if the exception happens after while parsing. As long as we error somewhere.

Currently, if we try parsing something like a decimal, it will error while parsing because instead of parsing something like 1.1 it is parsing \0\x03\0\x02\0\0\0\0\0\x15\x12\x8c\x0e? and erroring.

I could add something like

function Base.parse(::Type{T}, pqv::PQValue{R, BINARY})  where {T, R}
    error("Parsing for Type $T and OID $R in binary has not been implemented")
end

How does that seem?

@arnaudh
Copy link

arnaudh commented Jul 20, 2021

Currently, if we try parsing something like a decimal, it will error while parsing because instead of parsing something like 1.1 it is parsing \0\x03\0\x02\0\0\0\0\0\x15\x12\x8c\x0e? and erroring

Ah, yeah we shouldn't rely on the bytes being parseable or not for different types (I can imagine the same bytes being valid representations of different types). I was assuming that with the existing method signatures we'd get a "no method matching" error if we passed a type that we don't support for binary. If not, then yes it sounds like your proposed solution would achieve that (but better check with @iamed2).

@iamed2
Copy link
Collaborator

iamed2 commented Jul 21, 2021

Adding a not-implemented method is an antipattern; we should rely on errors before dispatch (MethodErrors or other). I think we can accomplish this by restricting a lot of the default/fallback methods to where they're actually applicable. For example, Base.parse(::Type{String}, pqv::PQValue) is invalid; this should be limited to Base.parse(::Type{String}, pqv::PQValue{OID,TEXT}) where OID and for BINARY only with string-like OIDs. You should do a survey of existing methods on PQValue and see if they apply to BINARY PQValues.

As an aside, it might be useful to have:

const PQTextValue{OID} = PQValue{OID,TEXT}
const PQBinaryValue{OID} = PQValue{OID,BINARY}

One thing I also see is the column type defaulting to String. Perhaps we can avoid that for binary results, as it doesn't make much sense as a default there. The default could be removed, or we could default to Vector{UInt8}.

@iamed2
Copy link
Collaborator

iamed2 commented Jul 21, 2021

Is this ready to merge into master even though we've only implemented binary for 1 data type?

Should we hold off until we implement more? Should we add a warning if someone tries to use binary data? Add a warning in the docs? Or just have a generic catch all implementation that someone can override?

I think we should implement, to start:

  • int2, int4, int8 (done)
  • float4, float8 (should be the same as int4 and int8 but reinterpreted as a Float32 or Float64)
  • "char", varchar, bpchar, text
  • if easy, numeric

@samuel-massinon samuel-massinon marked this pull request as ready for review July 30, 2021 18:40
src/LibPQ.jl Outdated Show resolved Hide resolved
src/LibPQ.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated Show resolved Hide resolved
src/parsing.jl Outdated

function parse_numeric_array(eltype::Type{T}, str::AbstractString) where T
function parse_numeric_array(eltype::Type{T}, str::AbstractString) where {T}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put where clauses in brackets if they are in long-form function definitions and fit on a single line.

src/results.jl Outdated Show resolved Hide resolved
src/results.jl Outdated
jl_result.column_oids = col_oids = map(1:num_columns(jl_result)) do col_num
libpq_c.PQftype(jl_result.result, col_num - 1)
end
jl_result.column_oids =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply this in other places as well. I will make an issue for including specifically chained assignments on one line in BlueStyle as well to be official.

src/results.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated
@@ -275,6 +292,292 @@ end
close(conn)
end


@testset "Binary" begin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How close is this test section to tests above? Could any of it be put in a loop over (TEXT, BINARY)? Especially around the data parsing, could it loop over the data and decide whether to mark it as broken/unimplemented depending on whether the type is part of an implemented list of types? This would help remove duplication of tests and avoid them getting out of sync.

@iamed2
Copy link
Collaborator

iamed2 commented Aug 3, 2021

Docstrings should be updated for the binary_format kwarg, noting that type support is very limited.

Please make this issue: #223 (comment)

@samuel-massinon
Copy link
Author

Here's the issue for numeric types #225

@samuel-massinon
Copy link
Author

It seems that most of the coverage misses in the diff are where the formatter added more lines. I don't think any code that I wrote and was being hit before is missed.

src/asyncresults.jl Outdated Show resolved Hide resolved
src/parsing.jl Show resolved Hide resolved
src/asyncresults.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
test/runtests.jl Outdated Show resolved Hide resolved
Samuel Massinon added 2 commits August 12, 2021 11:41
Copy link
Collaborator

@iamed2 iamed2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be sure to squash when (or before) merging!

@samuel-massinon samuel-massinon changed the title Query Integers in Binary Format Query Basic Types in Binary Format Aug 12, 2021
@samuel-massinon samuel-massinon merged commit f1fe4c4 into master Aug 12, 2021
@samuel-massinon samuel-massinon deleted the sm/binary branch August 12, 2021 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants