Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement casting between bytes and json #2482

Merged
merged 3 commits into from
Nov 8, 2021
Merged

implement casting between bytes and json #2482

merged 3 commits into from
Nov 8, 2021

Conversation

fmoor
Copy link
Member

@fmoor fmoor commented Apr 21, 2021

fixes #2459

Casting between bytes and json is enabled by encoding bytes to base64 encoded json strings.

@fmoor fmoor marked this pull request as ready for review April 21, 2021 20:16
@fmoor fmoor requested a review from elprans April 21, 2021 20:17
Copy link
Member

@elprans elprans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please add a test with shape casts to JSON as well.

r""" SELECT <json>b'foo'; """,
['Zm9v'],
['"Zm9v"'],
)
Copy link
Member

@1st1 1st1 Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add tests when there's a bytes type is in a shape, e.g.

SELECT <json>User {
  name,
  some_bytes := b'aaa'
}

and the same query without <json> but when compiled in JSON output mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casting shapes with <json> and query_json() don't work with the current changes. I've been tinkering with the compiler for a while trying to find the relative parts. If anyone has any pointers let me know.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting issue, actually. See compile_TypeCast in edgeql/compiler/expr.py. Specifically the trailing else: branch sets up a context to compile the casted expression. What we are missing here is another context flag that would tell the shape processor in viewgen.py that it is inside a JSON cast, and that every shape element value should be cast to <json>. We do this type of recursion for collections, but not shapes, evidently. This isn't an issue for most types, since PostgreSQL's cast is the same as ours, but bytes exposed the bug here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

@fmoor
Copy link
Member Author

fmoor commented Apr 28, 2021

This query SELECT <json>schema::Type{name} seems to trigger an edge case:

edb.errors.SchemaError: cannot redefine property 'builtin' of object type 'schema::Object' as scalar type 'std::json'

I haven't started debugging this yet, but it seems like there are two inconsistencies with this error

  1. the property builtin is not in the result shape
  2. even if it was builtin of type std::bool should be castable to json right?

@elprans
Copy link
Member

elprans commented Apr 28, 2021

It's not about castability, you cannot change the type of an existing link or property in a shape. This is problem with this approach. I think the solution would be to not to redefine the original properties, but inject magic computables instead, e.g __json_name := <json>.name and then teach the SQL compiler to look for these when serializing shapes.

@fmoor
Copy link
Member Author

fmoor commented Apr 28, 2021

I think the solution would be to not to redefine the original properties, but inject magic computables instead, e.g __json_name := <json>.name and then teach the SQL compiler to look for these when serializing shapes.

I have no idea where to start with this 😕

@fmoor
Copy link
Member Author

fmoor commented Apr 29, 2021

It's not about castability, you cannot change the type of an existing link or property in a shape.

Why does it only complain about the builtin property? It works fine with everything else.

@elprans
Copy link
Member

elprans commented Apr 29, 2021

Why does it only complain about the builtin property?

Because we do implicit rewrites on introspection schema objects to support field defaults, and evidently that rewrite interacts badly with this. See get_schema_reflection_default

@1st1
Copy link
Member

1st1 commented Jun 23, 2021

@fmoor @elprans Hm, what's the status of this PR?

@fmoor
Copy link
Member Author

fmoor commented Jun 23, 2021

I think it is postponed. Maybe it should just be closed for now? @elprans

@elprans elprans marked this pull request as draft June 23, 2021 18:46
@elprans
Copy link
Member

elprans commented Jun 23, 2021

I marked it as draft. We'll come back to this later.

@1st1
Copy link
Member

1st1 commented Oct 22, 2021

@msullivan Sully, please take a look at this, would be nice to fix this in 1.0. Ask Elvis about the current shortcomings of this pr.

fmoor and others added 2 commits November 5, 2021 20:52
fixes #2459

Casting between bytes and json is enabled by encoding bytes to base64
encoded json strings.
… casts

This requires some various plumbing changes to make this work in nested cases.

Some logic in the IR side cast generation needed to be updated to not
include an inaccurate intermediate type when casting a bytes array
literal to json.

I think that much of the logic on the IR side for compiling json casts
from arrays and tuples can be excised (though probably with the
addition of a call to eta_expand), but I'm leaving that to a follow-up.
@msullivan msullivan marked this pull request as ready for review November 6, 2021 04:09
@msullivan
Copy link
Member

I took a different approach than the proposed one.

In addition to making bytes render as base64 when cast to json, we need to have them render that way in json output mode too. Once that is fixed, the whole object casting problem goes away, since casting objects to json simply relies on first serializing the object in a json output mode.

@msullivan msullivan requested a review from elprans November 6, 2021 04:16
irtyputils.is_bytes(styperef)
and not expr.ser_safe
):
cast_name = s_casts.get_cast_fullname_from_names(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a TODO: note here to remind us to find a way to generalize this, since bytes wouldn't be the last scalar with a custom JSON cast.

@@ -104,6 +105,16 @@ def is_abstract(typeref: irast.TypeRef) -> bool:
return typeref.is_abstract


def is_json(typeref: irast.TypeRef) -> bool:
"""Return True if *typeref* describes the json type."""
return typeref.id == s_obj.get_known_type_id('std::json')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use .base_type.id here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, good catch!

@msullivan msullivan merged commit 774bd10 into master Nov 8, 2021
@msullivan msullivan deleted the json-bytes branch November 8, 2021 19:57
msullivan added a commit that referenced this pull request Nov 8, 2021
msullivan added a commit that referenced this pull request Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent casting of bytes
4 participants