-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Value/Content APIs Factoring to Capture Structurally Similar Content Types #500
Comments
I don't think I like the struct Blob(Vec<u8>)
struct Clob(Vec<u8>)
enum Value {
Null(IonType),
Bool(bool),
Int(Int),
Float(f64),
Decimal(Decimal),
String(String),
Symbol(Symbol),
Blob(Blob),
Clob(Clob),
Timestamp(Timestamp),
SExp(SExp),
List(List),
Struct(Struct),
} If you want to operate on e.g. blobs and clobs the same way, you can match like this: match element.value() {
Value::Blob(Blob(bytes)) |
Value::Clob(Clob(bytes)) => {
bytes.len();
}
_ => todo!()
} If we're concerned about abstracting away the pub struct Lob {
bytes: LobBytes,
}
// Private type, analogous to `SymbolText`, but for Lobs.
enum LobBytes {
Vec(Vec<u8>),
StaticSlice(&'static [u8])
}
pub struct Sequence {
contents: Vec<Element>,
}
pub struct Blob(pub Lob)
pub struct Clob(pub Lob)
pub struct List(pub Sequence)
pub struct SExp(pub Sequence)
// ...
match element.value() {
Value::Blob(Blob(lob)) |
Value::Clob(Clob(lob)) => {
// Do something with the Lob.
}
Value::List(List(seq)) |
Value::SExp(SExp(seq)) => {
// Do something with the Sequence.
}
_ => todo!()
} We'll need to write a few more |
We are; it gives us the ability to replace the implementation later. I just gave Matt's proposal a whirl in the Rust playground. I like that it allows you to match on the two kinds at once OR individually without an additional guard. I'm ok with this if we don't mind the verbosity of the |
I like the idea of using the tuple struct for this to capture the tagging provided that the struct is fully transparent and we are just using it for the structure (e.g., I like the de-structuring of this a lot. The stuttering is regrettable, but not a deal breaker for me. |
In the case of sequences, we might want more functionality than a Whether we want to do this is another matter, though. |
Another (potentially controversial!) option to consider: leveraging In our case:
Of course, we could achieve the same thing with the Here's a playground demonstration. This would also work for the sequence types, having |
See issue #500 for discussion. This PR removes the `IonSequence` trait and reintroduces a concrete `Sequence` type. It also converts `List` and `SExp` into tuple structs that wrap `Sequence`. `ListBuilder` and `SExpBuilder` have also been removed in favor of a single `SequenceBuilder` implementation with `build_list()` and `build_sexp()` methods. `List` and `SExp` each implement `Deref<Target=Sequence>`, allowing them both to expose the functionality of their underlying `Sequence` of elements transparently. This eliminates a fair amount of boilerplate code that was needed to provide functionality for both types. It also enables unified handling for `List` and `SExp` in match expressions in situations where the author does not care about the Ion type distinction.
I put together PR #502 that implements this pattern for |
I like where this is going--one last thought about factoring (actually one I had originally but dismissed until @popematt brought up the tuple struct approach. What do you folks think about: pub enum Lob {
Blob(Bytes),
Clob(Bytes)
} versus two top level structs. It lets us treat these related things as a unit like the |
In the |
I agree with Zack. |
This is, I think, is the core of where I don't line up with your thinking. We have I think we shouldn't try to force the Ion type granularity on content part of For example, I think a bunch of In other words, I'd rather put the proper typing and API ergonomics at the When I get a chance, I'll draft something up to illustrate what I mean, but if you think I am being out of line to how the team wants the APIs to be factored, I am very much willing to disagree and commit. |
Okay, so I think we have a decision to make about whether there should be structs that represent a single, specific Ion type, but it is orthogonal to the question about having a
Either way, introducing a That being said, I think you're right that the Ion type-ing should probably be modeled on That being said, I could see something like this maybe being useful: pub struct BlobElement(pub Annotations, pub Bytes)
pub struct ClobElement(pub Annotations, pub Bytes)
pub struct SymbolElement(pub Annotations, pub Symbol)
// Etc. for every other Ion type.
enum Element {
Blob(BlobElement),
Clob(ClobElement),
// ...
}
enum LobElement {
Blob(BlobElement),
Clob(ClobElement),
}
// Also:
// NumberElement
// TextElement
// SequenceElement
impl TryFrom<Element> for LobElement {
fn try_from(element: Element) -> Result<LobElement, IonTypeCastError> {
match element {
Element::Blob(b) => Ok(LobElement::Blob(b)),
Element::Clob(c) => Ok(LobElement::Clob(c)),
other => Err(IonTypeCastError(format!("Expected a Blob or Clob; was {}", other.ion_type()))),
}
}
}
// Also:
impl TryFrom<LobElement> for BlobElement { /*...*/ }
impl TryFrom<LobElement> for ClobElement { /*...*/ }
impl TryFrom<Element> for BlobElement { /*...*/ }
impl TryFrom<Element> for ClobElement { /*...*/ }
// Etc. for every other possible narrowing in the type hierarchy.
// Also also:
impl From<BlobElement> for Element { /*...*/ }
impl From<ClobElement> for Element { /*...*/ }
impl From<LobElement> for Element { /*...*/ }
impl From<BlobElement> for LobElement { /*...*/ }
impl From<ClobElement> for LobElement { /*...*/ }
// Etc. for every other possible widening in the type hierarchy. Then, any time you want to accept a specific type of element, you can do |
So I think your That said, if we had the element APIs above and the trivial converters (or even trait layering on top with I might still consider factoring /// Represents the type and content of an Ion value.
/// The types of Ion values are encoded
enum Value {
Null(IonType),
Int(Int),
Float(f64),
Decimal(Decimal),
Timestamp(Timestamp),
String(Str),
Symbol(Symbol),
Bool(bool),
Binary(BinaryValue),
Elements(ElementsValue),
Struct(Struct),
}
/// Structural part of the [Value] that represents `blob`/`clob`.
/// Generally, users will not refer to this type directly, but as part of de-structuring
/// a [Value] and as an intermediate to construct a [Value] of the intended type.
enum BinaryValue {
Clob(Bytes),
Blob(Bytes),
}
impl BinaryValue {
fn bytes(&self) -> &Bytes {
match &self {
Blob(b) => b,
Clob(b) => b,
}
}
}
/// Structural part of the [Value] that represents `list`/`sexp`
/// Generally, users will not refer to this type directly, but as part of de-structuring
/// a [Value] and as an intermediate to construct a [Value] of the intended type.
enum ElementsValue {
List(Elements),
SExp(Elements),
} I named the sub- |
Here are some of the de-structuring cases that I think read better: // match any binary type
match &v {
Binary(b) => Ok(b.bytes()),
_ => Error::msg("Not what I was expecting"),
}
// match specifically BLOB
match &v {
Binary(Blob(b)) => Ok(b),
_ => Error::msg("Not what I was expecting"),
} |
Read better than what—the current API or something else?
I actually don't like this example because lots of matching like this will get very noisy—I'd rather just use the short-circuit return. I also think that using let Blob(b) = &v.try_into()?; I think we really ought to be using I think I've actually changed my mind since writing my last comment. After playing around with some examples, I'd like even // Using try_into() and infallible destructuring
let ListElement(annotations, some_sequence) = element_1.try_into()?;
let LobElement(_, bytes) = element_2.try_into()?;
// For a symbol:
let symbol_text = element_3.try_into::<SymbolElement>()?.text_or_err()?;
// Using try_into() and exhaustive pattern matching, ignoring annotations
let NumberElement(_, number) = element_3.try_into()?;
match number {
NumberValue::I64(i) => { /* handle int */ }
NumberValue::BigInt(i) => { /* handle int */ }
NumberValue::Float(f) => { /* handle float */ }
NumberValue::Decimal(d) => { /* handle decimal */ }
} Here's a real-world example—I'd love to be able to rewrite something like these ~55 lines of let ListElement(annotations, sequence) = value.try_into()?;
invalid_isl!(!annotations.is_empty(), "timestamp_offset list may not be annotated")?;
let valid_offsets: Vec<TimestampOffset> = sequence.iter()
.map(|it| {
let StringElement(annotations, text) = it.try_into()?;
invalid_isl!(!annotations.is_empty(), "timestamp_offset value may not be annotated")?;
text.try_into::<TimestampOffset>()
} )
.collect()?; As long as we have |
See issue #500 for discussion. This PR removes the `IonSequence` trait and reintroduces a concrete `Sequence` type. It also converts `List` and `SExp` into tuple structs that wrap `Sequence`. `ListBuilder` and `SExpBuilder` have also been removed in favor of a single `SequenceBuilder` implementation with `build_list()` and `build_sexp()` methods. `List` and `SExp` each delegate method calls to their underlying `Sequence` via the `delegate!` macro, allowing users to access its functionality transparently. This eliminates a fair amount of boilerplate code that was needed to provide functionality for both types. It also enables unified handling for `List` and `SExp` in match expressions in situations where the author does not care about the Ion type distinction.
Read better than the current match element.value() {
Value::Blob(Blob(lob)) |
Value::Clob(Clob(lob)) => {
// Do something with the Lob.
}
Value::List(List(seq)) |
Value::SExp(SExp(seq)) => {
// Do something with the Sequence.
}
_ => todo!()
} I would suggest we move the discussion about typed To be clear, I think we're in violent agreement about a typed |
I agree with you so much that you'd better watch your back. 😉
I think if we have properly typed If we have some sort of |
I'm still not sold on the value of unions like Having union variants for structurally similar types optimizes for the case in which users of the API:
I don't believe that this is a common enough case to optimize for. The vast majority of Ion users don't know what a I also don't think we should actively encourage users to treat Following PR #502, a user would do this to get a if let List(l) = value {
// Do something with a list
} They want a list, they get a list. If it's unified, they need to do: if let Sequence(List(l)) = value {
// Do something with `l`
} Which means:
As currently implemented, allowing unified handling of the if let List(List(sequence)) | SExp(SExp(sequence)) {
// do something with `sequence`
} I find the 1:1 mapping from |
I had a discussion with @popematt about this offline; I think I have a way to get the best of both worlds. Will follow up shortly. |
This PR partially addresses issue #500. It modifies the `Value` enum's `List` and `SExp` variants from: ```rust enum Value { // ... List(List), SExp(SExp), // ... } ``` to ```rust enum Value { // ... List(Sequence), SExp(Sequence), } ``` while also preserving the standalone `element::List(Sequence)` and `element::SExp(Sequence)` types, which are used primarily by the SequenceBuilder, the `ion_list!`, and `ion_sexp!`.
Please see PR #505 for my proposed fix. It removes the extra nesting from the |
This PR partially addresses issue #500. It modifies the `Value` enum's `List` and `SExp` variants from: ```rust enum Value { // ... List(List), SExp(SExp), // ... } ``` to ```rust enum Value { // ... List(Sequence), SExp(Sequence), } ``` while also preserving the standalone `element::List(Sequence)` and `element::SExp(Sequence)` types, which are used primarily by the SequenceBuilder, the `ion_list!`, and `ion_sexp!`.
This PR introduces new opaque wrappers for `Vec<u8>`, which was used throughout APIs to represent owned blobs and clobs. `Bytes` captures the core functionality of an owned byte array, but does not include Ion type information. `Blob` and `Clob` are both trivial tuple struct wrappers around `Bytes`, adding an Ion type to the data. The `Value::Blob` and `Value::Clob` enum variants (which already provide an Ion type) now wrap an instance of `Bytes`. This PR fixes #500.
PR #506 implements the proposed fix for |
This PR introduces new opaque wrappers for `Vec<u8>`, which was used throughout APIs to represent owned blobs and clobs. `Bytes` captures the core functionality of an owned byte array, but does not include Ion type information. `Blob` and `Clob` are both trivial tuple struct wrappers around `Bytes`, adding an Ion type to the data. The `Value::Blob` and `Value::Clob` enum variants (which already provide an Ion type) now wrap an instance of `Bytes`. This PR fixes #500.
Problem Statement
Ion contains types that are structurally identical, but differ only in their type. In our current factoring of
Value
, we end up creating variants that are basically the same content, but differ only in the type ofstruct
used in the variant. This leads to some code duplication and traits, that require generic code to be used if one wants to operate across these structurally identical types. I believe we could "push down" the value variants as a way to factor the API to avoid this boilerplate.This is a distillation of my thinking after an offline thread with @zslayton.
Background
In the original
Element
/Value
APIs theValue
was theenum
representing type and its contents of the value were modeled directly in theenum
. The following is the original factoring abbreviated for illustration purposes:The latest refactoring of these APIs have captured the type within the content for the variants that are useful as intermediaries for the various APIs that build up these types and need to capture the intent of the user (e.g., I would like to build this sequence that is to become a
Value::List(...)
) this need is captured for example in the builder APIs and macros PR. This factoring looks something like this (again, abbreviated/digested for illustration purposes):The problem with the above is two-fold, we have to sort of copy/paste the implementation
Clob
/Blob
andSExp
/List
. These types have no relationship to each other (modulo a shared trait) and have some minor ergonomic implications to matching on say anyBlob
/Clob
(i.e., work on any "LOB" value).Proposed Factoring
I think we can mitigate the boilerplate above while retaining the benefits of having content that is strongly typed and "knows" what kind of variant it needs to be by pushing down these shared types in the
Value
enum:In the above, de-structuring becomes a bit more nested and the representation is marginally bigger in memory, but we no longer need the bridge traits and code can construct
Lob
andSequence
independently fromValue
which eliminates the boilerplate and the need for the bridge trait. This factoring also makes it trivial to operate on the structurally similar types by destructing to the common variant (i.e.,Lob(...)
andSequence(...)
).The text was updated successfully, but these errors were encountered: