Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Addr::unchecked a const fn #869

Closed
wants to merge 1 commit into from
Closed

Make Addr::unchecked a const fn #869

wants to merge 1 commit into from

Conversation

webmaster128
Copy link
Member

This allows you to use Addr::unchecked("foo") in all the places where you can use "foo", specially const VARIABLES. This uses the facts that:

  1. We have a known max size and Addr can have a constant size. The memory layout can safely be copied.
  2. In contract to String, Addr is immutable. So it does not matter if we move or copy.

Comment on lines +60 to +63
while i < input_bytes.len() {
data[i] = input_bytes[i];
i += 1;
}
Copy link
Contributor

@yihuang yihuang Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while i < input_bytes.len() {
data[i] = input_bytes[i];
i += 1;
}
data[..input_bytes.len()].clone_from_slice(input.as_bytes());

I think we can do this here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are very limited in what is available because it must be const fn compatible. We can't even rewrite the while loop to a for i in 0...input_bytes.len() loop. clone_from_slice is not a const fn, so it cannot be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about optimizing it, I looked at the implementation of eg. https://doc.rust-lang.org/core/ptr/fn.copy_nonoverlapping.html but that just point to rust intrinsics, so I guess this is the best we can do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow – seems like we can use copy_nonoverlapping directly because it became const (rust-lang/rust#79684). For some reason this is not shown in the docs yet. Will try.

Copy link
Member

@ethanfrey ethanfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting approach. This does obviate the need for AddrRef, but at the same time creates a rather large array that must be copied every time we pass it through a function.

I am unsure of the benefits of this over using the AddrRef implementation I had, making AddrRef::unchecked(&str) a const fn and "promoting it" via Addr::from(addrRef). That would make the clone (copy data to heap) explicit and work well with the lifetimes we have in storage-plus.

We could then use AddrRef::unchecked("name") in all the test code and just name.into() when needed (no more unchecked). Can you explain the benefits of this approach?

#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq, Hash, JsonSchema)]
pub struct Addr(String);
#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash, JsonSchema)]
pub struct Addr(#[schemars(with = "String")] AddrInner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two level nesting here is for schemars output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eaxtly. If we find a better way to ensure it gets a "string" schema, we can simplify at any time because all the internals are private.

where
E: de::Error,
{
if v.len() > ADDR_MAX_LENGTH {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to assert this here.
Should we assert this in unchecked as well? Or would that violate the const fn (you could just panic there, but it would ensure nothing gets chopped off by accident)

let mut data = [0u8; ADDR_MAX_LENGTH];
let mut i = 0;
while i < input_bytes.len() {
data[i] = input_bytes[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can buffer overflow if we don't have a length assertion for input

@@ -93,25 +166,25 @@ impl PartialEq<Addr> for String {

impl From<Addr> for String {
fn from(addr: Addr) -> Self {
addr.0
addr.as_str().to_owned()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also call this to_string? Sometimes it is nice to have explicit names (as_str, to_string) and not just from and into

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_string is already implemented via the Display trait implementation. This implies a ToString implementation.

}
}

impl From<&Addr> for String {
fn from(addr: &Addr) -> Self {
addr.0.clone()
addr.as_str().to_owned()
}
}

impl From<Addr> for HumanAddr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we remove these - let's not support HumanAddr with the new code (and if someone wants that can always do addr.as_str().into().

/// The maximum allowed size for bech32 (https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#bech32)
const ADDR_MAX_LENGTH: usize = 90;

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to reflect on the performance changes of this. We should add some guidance on usage.

  1. Passing Addr as an argument copies 94 bytes, rather than a 4 byte pointer (like .clone() of the past version, but no heap allocations).
  2. Passing &Addr is cheap and can be restored to Addr via copy() or clone().
  3. Addr/&Addr.as_str() is cheap
  4. Addr.to_string() requires a heap allocation (not just consuming the original Addr).

With this in mind, I think we can recommend returning Addr and passing in &Addr to functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if we pass Addr references to functions (which is the recommended approach in general), and only return Addr, I think this would be OK, performance/memory-wise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to play around with this a bit next week. The size of Addr is also my major (and only) concern here. But since we do not have huge amounts of addresses in memory at the same time and we have a well known size limit, I think it is fair to store the addresses on the stack.

We can always push them to heap by using Box<Addr>. Then we get what we had before, just without the mutability of a String implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem you are trying to solve?
I feel this will just make my life harder and not solve the problem that motivated AddrRef

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem you are trying to solve?

Make using owned Addr easier, especially literals. Then in many places you suddenly get an Addr->&str conversion instead of the opposite direction.

I'm not claiming this solves all issues and it is well possible that it does not replace what AddrRef does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make using owned Addr easier, especially literals.

I didn't have any issues there. And am afraid I will have to think way too much about unneeded copies of this struct if it happens. See #872 for another solution that provide simple literals for tests

Comment on lines +60 to +63
while i < input_bytes.len() {
data[i] = input_bytes[i];
i += 1;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about optimizing it, I looked at the implementation of eg. https://doc.rust-lang.org/core/ptr/fn.copy_nonoverlapping.html but that just point to rust intrinsics, so I guess this is the best we can do

Copy link
Contributor

@maurolacy maurolacy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

packages/std/src/addresses.rs Show resolved Hide resolved
packages/std/src/addresses.rs Show resolved Hide resolved
@ethanfrey ethanfrey mentioned this pull request Apr 9, 2021
@webmaster128
Copy link
Member Author

We don't do that

@webmaster128 webmaster128 deleted the stack-addr branch April 12, 2021 12:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants