Skip to content

Commit 5f92fa7

Browse files
committed
Add minimal support for internally tagged and untagged enums (ron-rs#451)
* Add minimal support for internally tagged and untagged enums * Edge cases over edge cases * Add CHANGELOG entry and update README * Fix the PR references in the README
1 parent 271c1cd commit 5f92fa7

8 files changed

+441
-40
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2424
- [Non-API] Breaking: Bump `bitflags` dependency to 2.0, changes `serde` impls of `Extensions` ([#443](https://github.com/ron-rs/ron/pull/443))
2525
- Add `Map::retain` method ([#460](https://github.com/ron-rs/ron/pull/460))
2626
- Bump MSRV to 1.64.0 and bump dependency: `indexmap` to 2.0 ([#459](https://github.com/ron-rs/ron/pull/459))
27+
- Add minimal roundtripping support for `#[serde(tag = "tag")]`, `#[serde(tag = "tag", content = "content")]`, and `#[serde(untagged)]` enums ([#451](https://github.com/ron-rs/ron/pull/451))
2728

2829
## [0.8.0] - 2022-08-17
2930

README.md

+9-5
Original file line numberDiff line numberDiff line change
@@ -104,13 +104,17 @@ Note the following advantages of RON over JSON:
104104

105105
## Limitations
106106

107-
RON is not designed to be a fully self-describing format (unlike JSON) and is thus not guaranteed to work when [`deserialize_any`](https://docs.rs/serde/latest/serde/trait.Deserializer.html#tymethod.deserialize_any) is used instead of its typed alternatives. In particular, the following Serde attributes are not yet supported:
108-
- `#[serde(tag = "type")]`, i.e. internally tagged enums
109-
- `#[serde(untagged)]`, i.e. untagged enums
107+
RON is not designed to be a fully self-describing format (unlike JSON) and is thus not guaranteed to work when [`deserialize_any`](https://docs.rs/serde/latest/serde/trait.Deserializer.html#tymethod.deserialize_any) is used instead of its typed alternatives. In particular, the following Serde attributes only have limited support:
110108

111-
Furthermore, `#[serde(flatten)]` only has limited support and relies on a small hack [^serde-flatten-hack]. Specifically, flattened structs are only serialised as maps and deserialised from maps. However, this limited implementation supports full roundtripping.
109+
- `#[serde(tag = "tag")]`, i.e. internally tagged enums [^serde-enum-hack]
110+
- `#[serde(tag = "tag", content = "content")]`, i.e. adjacently tagged enums [^serde-enum-hack]
111+
- `#[serde(untagged)]`, i.e. untagged enums [^serde-enum-hack]
112+
- `#[serde(flatten)]`, i.e. flattening of structs into maps [^serde-flatten-hack]
112113

113-
[^serde-flatten-hack]: Deserialising a flattened struct from a map requires that the struct's [`Visitor::expecting`](https://docs.rs/serde/latest/serde/de/trait.Visitor.html#tymethod.expecting) implementation formats a string starting with `"struct "`. This is the case for automatically-derived [`Deserialize`](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html) impls on structs.
114+
While data structures with any of these attributes should roundtrip through RON, their textual representation may not always match your expectation. For instance, flattened structs are only serialised as maps and deserialised from maps.
115+
116+
[^serde-enum-hack]: Deserialising an internally, adjacently, or un-tagged enum requires detecting `serde`'s internal `serde::__private::de::content::Content` content type so that RON can describe the deserialised data structure in serde's internal JSON-like format. This detection only works for the automatically-derived [`Deserialize`](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html) impls on enums. See [#451](https://github.com/ron-rs/ron/pull/451) for more details.
117+
[^serde-flatten-hack]: Deserialising a flattened struct from a map requires that the struct's [`Visitor::expecting`](https://docs.rs/serde/latest/serde/de/trait.Visitor.html#tymethod.expecting) implementation formats a string starting with `"struct "`. This is the case for automatically-derived [`Deserialize`](https://docs.rs/serde/latest/serde/de/trait.Deserialize.html) impls on structs. See [#455](https://github.com/ron-rs/ron/pull/455) for more details.
114118

115119
## RON syntax overview
116120

src/de/mod.rs

+74-15
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ use crate::{
1717
error::{Result, SpannedResult},
1818
extensions::Extensions,
1919
options::Options,
20-
parse::{AnyNum, Bytes, ParsedStr, BASE64_ENGINE},
20+
parse::{AnyNum, Bytes, ParsedStr, StructType, BASE64_ENGINE},
2121
};
2222

2323
mod id;
@@ -33,6 +33,7 @@ mod value;
3333
pub struct Deserializer<'de> {
3434
bytes: Bytes<'de>,
3535
newtype_variant: bool,
36+
serde_content_newtype: bool,
3637
last_identifier: Option<&'de str>,
3738
recursion_limit: Option<usize>,
3839
}
@@ -56,6 +57,7 @@ impl<'de> Deserializer<'de> {
5657
let mut deserializer = Deserializer {
5758
bytes: Bytes::new(input)?,
5859
newtype_variant: false,
60+
serde_content_newtype: false,
5961
last_identifier: None,
6062
recursion_limit: options.recursion_limit,
6163
};
@@ -140,25 +142,49 @@ impl<'de> Deserializer<'de> {
140142
/// struct and deserializes it accordingly.
141143
///
142144
/// This method assumes there is no identifier left.
143-
fn handle_any_struct<V>(&mut self, visitor: V) -> Result<V::Value>
145+
fn handle_any_struct<V>(&mut self, visitor: V, ident: Option<&str>) -> Result<V::Value>
144146
where
145147
V: Visitor<'de>,
146148
{
147-
// Create a working copy
148-
let mut bytes = self.bytes;
149+
// HACK: switch to JSON enum semantics for JSON content
150+
// Robust impl blocked on https://github.com/serde-rs/serde/pull/2420
151+
let is_serde_content =
152+
std::any::type_name::<V::Value>() == "serde::__private::de::content::Content";
149153

150-
if bytes.consume("(") {
151-
bytes.skip_ws()?;
154+
let old_serde_content_newtype = self.serde_content_newtype;
155+
self.serde_content_newtype = false;
152156

153-
if bytes.check_tuple_struct()? {
154-
// first argument is technically incorrect, but ignored anyway
155-
self.deserialize_tuple(0, visitor)
156-
} else {
157+
match (self.bytes.check_struct_type()?, ident) {
158+
(StructType::Unit, Some(ident)) if is_serde_content => {
159+
// serde's Content type needs the ident for unit variants
160+
visitor.visit_str(ident)
161+
}
162+
(StructType::Unit, _) => visitor.visit_unit(),
163+
(_, Some(ident)) if is_serde_content => {
164+
// serde's Content type uses a singleton map encoding for enums
165+
visitor.visit_map(SerdeEnumContent {
166+
de: self,
167+
ident: Some(ident),
168+
})
169+
}
170+
(StructType::Named, _) => {
157171
// giving no name results in worse errors but is necessary here
158172
self.handle_struct_after_name("", visitor)
159173
}
160-
} else {
161-
visitor.visit_unit()
174+
(StructType::NewtypeOrTuple, _) if old_serde_content_newtype => {
175+
// deserialize a newtype struct or variant
176+
self.bytes.consume("(");
177+
self.bytes.skip_ws()?;
178+
let result = self.deserialize_any(visitor);
179+
self.bytes.skip_ws()?;
180+
self.bytes.consume(")");
181+
182+
result
183+
}
184+
(StructType::Tuple | StructType::NewtypeOrTuple, _) => {
185+
// first argument is technically incorrect, but ignored anyway
186+
self.deserialize_tuple(0, visitor)
187+
}
162188
}
163189
}
164190

@@ -243,14 +269,14 @@ impl<'de, 'a> de::Deserializer<'de> for &'a mut Deserializer<'de> {
243269
// `identifier` does not change state if it fails
244270
let ident = self.bytes.identifier().ok();
245271

246-
if ident.is_some() {
272+
if let Some(ident) = ident {
247273
self.bytes.skip_ws()?;
248274

249-
return self.handle_any_struct(visitor);
275+
return self.handle_any_struct(visitor, Some(std::str::from_utf8(ident)?));
250276
}
251277

252278
match self.bytes.peek_or_eof()? {
253-
b'(' => self.handle_any_struct(visitor),
279+
b'(' => self.handle_any_struct(visitor, None),
254280
b'[' => self.deserialize_seq(visitor),
255281
b'{' => self.deserialize_map(visitor),
256282
b'0'..=b'9' | b'+' | b'-' => {
@@ -925,3 +951,36 @@ fn struct_error_name(error: Error, name: Option<&str>) -> Error {
925951
e => e,
926952
}
927953
}
954+
955+
struct SerdeEnumContent<'a, 'de: 'a> {
956+
de: &'a mut Deserializer<'de>,
957+
ident: Option<&'a str>,
958+
}
959+
960+
impl<'de, 'a> de::MapAccess<'de> for SerdeEnumContent<'a, 'de> {
961+
type Error = Error;
962+
963+
fn next_key_seed<K>(&mut self, seed: K) -> Result<Option<K::Value>>
964+
where
965+
K: DeserializeSeed<'de>,
966+
{
967+
self.ident
968+
.take()
969+
.map(|ident| seed.deserialize(serde::de::value::StrDeserializer::new(ident)))
970+
.transpose()
971+
}
972+
973+
fn next_value_seed<V>(&mut self, seed: V) -> Result<V::Value>
974+
where
975+
V: DeserializeSeed<'de>,
976+
{
977+
self.de.bytes.skip_ws()?;
978+
979+
let old_serde_content_newtype = self.de.serde_content_newtype;
980+
self.de.serde_content_newtype = true;
981+
let result = seed.deserialize(&mut *self.de);
982+
self.de.serde_content_newtype = old_serde_content_newtype;
983+
984+
result
985+
}
986+
}

src/parse.rs

+65-8
Original file line numberDiff line numberDiff line change
@@ -427,17 +427,67 @@ impl<'a> Bytes<'a> {
427427
.map_or(false, |&b| is_ident_other_char(b))
428428
}
429429

430-
/// Should only be used on a working copy
431-
pub fn check_tuple_struct(mut self) -> Result<bool> {
432-
if self.identifier().is_err() {
433-
// if there's no field ident, this is a tuple struct
434-
return Ok(true);
430+
pub fn check_struct_type(&mut self) -> Result<StructType> {
431+
fn check_struct_type_inner(bytes: &mut Bytes) -> Result<StructType> {
432+
if !bytes.consume("(") {
433+
return Ok(StructType::Unit);
434+
}
435+
436+
bytes.skip_ws()?;
437+
438+
if bytes.identifier().is_ok() {
439+
bytes.skip_ws()?;
440+
441+
match bytes.peek() {
442+
// Definitely a struct with named fields
443+
Some(b':') => return Ok(StructType::Named),
444+
// Definitely a tuple struct with fields
445+
Some(b',') => return Ok(StructType::Tuple),
446+
// Either a newtype or a tuple struct
447+
Some(b')') => return Ok(StructType::NewtypeOrTuple),
448+
// Something else, let's investigate further
449+
_ => (),
450+
};
451+
}
452+
453+
let mut braces = 1;
454+
let mut comma = false;
455+
456+
// Skip ahead to see if the value is followed by a comma
457+
while braces > 0 {
458+
// Skip spurious braces in comments and strings
459+
bytes.skip_ws()?;
460+
let _ = bytes.string();
461+
462+
let c = bytes.eat_byte()?;
463+
if c == b'(' || c == b'[' || c == b'{' {
464+
braces += 1;
465+
} else if c == b')' || c == b']' || c == b'}' {
466+
braces -= 1;
467+
} else if c == b',' && braces == 1 {
468+
comma = true;
469+
break;
470+
}
471+
}
472+
473+
if comma {
474+
Ok(StructType::Tuple)
475+
} else {
476+
Ok(StructType::NewtypeOrTuple)
477+
}
435478
}
436479

437-
self.skip_ws()?;
480+
// Create a temporary working copy
481+
let mut bytes = *self;
438482

439-
// if there is no colon after the ident, this can only be a unit struct
440-
self.eat_byte().map(|c| c != b':')
483+
let result = check_struct_type_inner(&mut bytes);
484+
485+
if result.is_err() {
486+
// Adjust the error span to fit the working copy
487+
*self = bytes;
488+
}
489+
490+
result
441491
}
442492

443493
/// Only returns true if the char after `ident` cannot belong
@@ -996,6 +1046,13 @@ pub enum ParsedStr<'a> {
9961046
Slice(&'a str),
9971047
}
9981048

1049+
pub enum StructType {
1050+
NewtypeOrTuple,
1051+
Tuple,
1052+
Named,
1053+
Unit,
1054+
}
1055+
9991056
#[cfg(test)]
10001057
mod tests {
10011058
use super::*;

tests/117_untagged_tuple_variant.rs

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ fn test_ebkalderon_case() {
2323
flags: [
2424
"--enable-thing",
2525
"--enable-other-thing",
26-
If("some-conditional", ["--enable-third-thing"]),
26+
("some-conditional", ["--enable-third-thing"]),
2727
]
2828
)
2929
"#;

tests/307_stack_overflow.rs

-38 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)