Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add key and value methods to DebugMap #2696

Merged
merged 5 commits into from
Jul 7, 2019
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
265 changes: 265 additions & 0 deletions text/0000-debug-map-key-value.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
- Feature Name: `debug_map_key_value`
- Start Date: 2019-05-01
- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000)
- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000)

# Summary
[summary]: #summary

Add two new methods to `std::fmt::DebugMap` for writing the key and value part of a map entry separately:

```rust
impl<'a, 'b: 'a> DebugMap<'a, 'b> {
pub fn key(&mut self, key: &dyn Debug) -> &mut Self;
pub fn value(&mut self, value: &dyn Debug) -> &mut Self;
}
```

# Motivation
[motivation]: #motivation

The format builders available to `std::fmt::Debug` implementations through the `std::fmt::Formatter` help keep the textual debug representation of Rust structures consistent. They're also convenient to use and make sure the various formatting flags are retained when formatting entries. The standard formatting API in `std::fmt` is similar to `serde::ser`:

- `Debug` -> `Serialize`
- `Formatter` -> `Serializer`
- `DebugMap` -> `SerializeMap`
- `DebugList` -> `SerializeSeq`
- `DebugTuple` -> `SerializeTuple` / `SerializeTupleStruct` / `SerilizeTupleVariant`
- `DebugStruct` -> `SerializeStruct` / `SerializeStructVariant`

There's one notable inconsistency though: an implementation of `SerializeMap` must support serializing its keys and values independently. This isn't supported by `DebugMap` because its `entry` method takes both a key and a value together. That means it's not possible to write a `Serializer` that defers entirely to the format builders.

Adding separate `key` and `value` methods to `DebugMap` will align it more closely with `SerializeMap`, and make it possible to build a `Serializer` based on the standard format builders.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

In `DebugMap`, an entry is the pair of a key and a value. That means the following `Debug` implementation:

```rust
use std::fmt;

struct Map;

impl fmt::Debug for Map {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let mut map = f.debug_map();

map.entry(&"key", &"value");

map.finish()
}
}
```

is equivalent to:

```rust
impl fmt::Debug for Map {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let mut map = f.debug_map();

// Equivalent to map.entry
map.key(&"key").value(&"value");

map.finish()
}
}
```

Every call to `key` must be directly followed by a corresponding call to `value` to complete the entry:

```rust
impl fmt::Debug for Map {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let mut map = f.debug_map();

map.key(&1);

// err: attempt to start a new entry without finishing the current one
map.key(&2);

map.finish()
}
}
```

`key` must be called before `value`:

```rust
impl fmt::Debug for Map {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let mut map = f.debug_map();

// err: attempt to write a value without first writing its key
map.value(&"value");
map.key(&"key");

map.finish()
}
}
```

Each entry must be finished before the map can be finished:

```rust
impl fmt::Debug for Map {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let mut map = f.debug_map();

map.key(&1);

// err: attempt to finish a map that has an incomplete key
map.finish()
}
}
```

Any incorrect calls to `key` and `value` will poison the `DebugMap` and cause it to return an error when calling `finish`.

## When to use `key` and `value`

Why would you want to use `key` and `value` directly if they're less convenient than `entry`? The reason is when the driver of the `DebugMap` is a framework like `serde` rather than a data structure directly:

```rust
struct DebugMap<'a, 'b: 'a>(fmt::DebugMap<'a, 'b>);

impl<'a, 'b: 'a> SerializeMap for DebugMap<'a, 'b> {
type Ok = ();
type Error = Error;

fn serialize_key<T: ?Sized>(&mut self, key: &T) -> Result<(), Self::Error>
where
T: Serialize,
{
self.0.key(&key.to_debug());
Ok(())
}

fn serialize_value<T: ?Sized>(&mut self, value: &T) -> Result<(), Self::Error>
where
T: Serialize,
{
self.0.value(&value.to_debug());
Ok(())
}

fn serialize_entry<K: ?Sized, V: ?Sized>(
&mut self,
key: &K,
value: &V,
) -> Result<(), Self::Error>
where
K: Serialize,
V: Serialize,
{
self.0.entry(&key.to_debug(), &value.to_debug());
Ok(())
}

fn end(self) -> Result<Self::Ok, Self::Error> {
self.0.finish().map_err(Into::into)
}
}
```

Consumers should prefer calling `entry` over `key` and `value`.

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

The `key` and `value` methods can be implemented on `DebugMap` by tracking the state of the current entry in a `bool`, and splitting the existing `entry` method into two:

```rust
pub struct DebugMap<'a, 'b: 'a> {
KodrAus marked this conversation as resolved.
Show resolved Hide resolved
has_key: bool,
..
}

pub fn debug_map_new<'a, 'b>(fmt: &'a mut fmt::Formatter<'b>) -> DebugMap<'a, 'b> {
DebugMap {
has_key: false,
..
}
}

impl<'a, 'b: 'a> DebugMap<'a, 'b> {
pub fn entry(&mut self, key: &dyn fmt::Debug, value: &dyn fmt::Debug) -> &mut DebugMap<'a, 'b> {
self.key(key).value(value)
}

pub fn key(&mut self, key: &dyn fmt::Debug) -> &mut DebugMap<'a, 'b> {
self.result = self.result.and_then(|_| {
// Make sure there isn't a partial entry
if self.has_key {
return Err(fmt::Error);
}

// write the key

// Mark that we're in an entry
self.has_key = true;
Ok(())
});

self
}

pub fn value(&mut self, value: &dyn fmt::Debug) -> &mut DebugMap<'a, 'b> {
self.result = self.result.and_then(|_| {
// Make sure there is a partial entry to finish
if !self.has_key {
return Err(fmt::Error);
}

// write the value

// Mark that we're not in an entry
self.has_key = false;
Ok(())
});

self.has_fields = true;
self
}

pub fn finish(&mut self) -> fmt::Result {
// Make sure there isn't a partial entry
if self.has_key {
return Err(fmt::Error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a fmt::Error like this isn't allowed. From https://doc.rust-lang.org/nightly/std/fmt/index.html#formatting-traits:

a formatting implementation must and may only return an error if the passed-in Formatter returns an error.

As this indicates a programming error I think an explicit panic here would be correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! A panic seems reasonable.

}

self.result.and_then(|_| self.fmt.write_str("}"))
}
}
```

# Drawbacks
[drawbacks]: #drawbacks

The proposed `key` and `value` methods are't immediately useful for `Debug` implementors that are able to call `entry` instead. This creates a decision point where there wasn't one before. The proposed implementation is also going to be less efficient than the one that exists now because it introduces a few conditionals.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding efficiency, I would suggest testing this out with a PR and then doing a timer build to see how much perf regresses (that just affects compile times but it's better than nothing). With that data we can evaluate the drawback better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened up a PR (rust-lang/rust#60458) that we can use to check correctness and get some timings 👍

I don't expect the difference to be significant, since it amounts to a few conditionals but it'll be good to verify!

Copy link
Contributor Author

@KodrAus KodrAus May 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty, I posted a quick benchmark in the PR, but also inlining the results here:

Before (current .entry()) After (.key().value())
36 ns/iter (+/- 7) 44 ns/iter (+/- 2)


On balance, the additional `key` and `value` methods are a small and unsurprising addition that enables a set of use-cases that weren't possible before, and aligns more closely with `serde`.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

The universal alternative of simply _not doing this_ leaves consumers that do need to format map keys independently of values with a few options:

- Write an alternative implementation of the format builders. The output from this alternative implementation would need to be kept reasonably in-sync with the one in the standard library. It doesn't change very frequently, but does from time to time. It would also have to take the same care as the standard library implementation to retain formatting flags when working with entries.
- Buffer keys and format them together with values when the whole entry is available. Unless the key is guaranteed to live until the value is supplied (meaning it probably needs to be `'static`) then the key will need to be formatted into a string first. This means allocating (though the cost could be amortized over the whole map) and potentially losing formatting flags when buffering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A third option here is to simply not error when keys and values come alone. That might actually be useful in some weird semi-map semi-array cases. That would also be more efficient?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's an interesting thought 🤔 I think that might just mask incorrect implementations though. I imagine we'd be tracking the same additional state too so that we knew when a key or value called 'out of order' should be formatted in a set-like way rather than a map-like way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah quite possibly.
Would be good to record this in the text of the RFC in any case. :)

Another alternative is to avoid poisoning the map if the sequence of entries doesn't follow the expected pattern of `key` then `value`. Instead, `DebugMap` could make a best-effort attempt to represent keys without values and values without keys. However, this approach has the drawback of masking incorrect `Debug` implementations, may produce a surprising output and doesn't reduce the complexity of the implementation (we'd still need to tell whether a key should be followed by a `: ` separator or a `, `).

# Prior art
[prior-art]: #prior-art

The `serde::ser::SerializeMap` API (and `libserialize::Encoder` for what it's worth) requires map keys and values can be serialized independently. `SerializeMap` provides a `serialize_entry` method, which is similar to the existing `DebugMap::entry`, but is only supposed to be used as an optimization.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

# Future possibilities
[future-possibilities]: #future-possibilities

The internal implementation could optimize the `entry` method to avoid a few redundant checks.