Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarity Value Serialization and Name String Guarding #1099

Merged
merged 21 commits into from
Sep 10, 2019

Conversation

kantai
Copy link
Member

@kantai kantai commented Aug 22, 2019

This PR implements a standard serialization and deserialization scheme for Clarity Values -- they are serialized to a subset of legal JSON strings. The deserializer uses Serde to parse the JSON, so the deserializer is more permissive than the serializer.

  • Changes the deserialization interface of Clarity values to have two options:
    • With an expected type: this is how the Clarity VM loads data out of the database: it has an expectation of the type (because of the type declarations of data maps, data vars, etc.)
    • WIthout an expected type: this is how the wire format deserializer should call it.
  • Refactor of the types.rs file into a folder module -- it had grown pretty unwieldy.
  • Some refactoring of the TypeSignature and Value code to eliminate repetition, dead code, and some increased test coverage.

Other objects will also need special serializers (anything that goes into the MARF should have a specified serialization), but that's going to be left for later work, since the wire format won't block on those.

This PR also adds guarding to the representation of names in Clarity (implementation of #1100) -- Contract names and general names (variables, functions, tuple fields) all now use one of two special types:

ClarityName
ContractName

Which are guarded wrappers around String types. These guards ensure that any such names are valid Clarity names (a subset of ASCII).

@codecov
Copy link

codecov bot commented Aug 22, 2019

Codecov Report

Merging #1099 into develop will increase coverage by 0.14%.
The diff coverage is 84.6%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop   #1099      +/-   ##
==========================================
+ Coverage    80.16%   80.3%   +0.14%     
==========================================
  Files          103     106       +3     
  Lines        23619   24769    +1150     
==========================================
+ Hits         18933   19890     +957     
- Misses        4686    4879     +193
Impacted Files Coverage Δ
src/vm/database/structures.rs 100% <ø> (ø) ⬆️
src/vm/analysis/type_checker/contexts.rs 93.91% <100%> (+0.1%) ⬆️
src/vm/tests/simple_apply_eval.rs 96.42% <100%> (-0.05%) ⬇️
src/vm/database/clarity_db.rs 92.76% <100%> (+0.18%) ⬆️
src/vm/analysis/type_checker/mod.rs 88.67% <100%> (+0.04%) ⬆️
src/vm/functions/define.rs 98.07% <100%> (ø) ⬆️
src/vm/analysis/type_checker/natives/maps.rs 100% <100%> (ø) ⬆️
src/vm/analysis/definition_sorter/mod.rs 91.44% <100%> (ø) ⬆️
src/vm/analysis/build_contract_interface/mod.rs 100% <100%> (ø) ⬆️
src/vm/functions/mod.rs 96.66% <100%> (+0.02%) ⬆️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdb4e8a...1297e42. Read the comment docs.

"Invalid principal literal: expected a `.` in a qualified contract name".to_string()).into());
}
let sender = Self::parse_standard_principal(split[0])?;
let name = split[1].to_string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that name is an ASCII string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as written -- but that's going to be addressed in #1078 (is my understanding), at which point we should have a simple guarded string type.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm merging my guarded strings work into this PR, so this should be addressed now


fn to_hex(val: &i128) -> String {
if *val >= 0 {
format!("{:x}", val)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is a paranoid delusion on my part, but how stable can we expect formatting rules to be in practice? I'm mainly concerned about whether or not the user's locale settings can affect the formatter's behavior (e.g. by making characters upper-case, by tacking on one or more leading 0's, by flipping the endianness, and so on).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share this fear, so it's at least a shared delusion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a while poking at the LowerHex docs, but produced no satisfactory specification, so I'll just update this to do the hexing manually.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for feeding my paranoia and general mistrust of GNU libc locale settings 👻

@kantai kantai self-assigned this Aug 22, 2019
}


#[cfg(test)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how reasonable or difficult this is, so feel free to dismiss.

Would it be desirable to add an "exhaustive" test that tries to serialize/deserialize all possible type combinations up to a certain (small) recursion depth? Might be the job of a fuzzer to do this; just something to think about.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- the Value enum is fairly amenable to such a test.

@kantai kantai changed the title Clarity Value Serialization Clarity Value Serialization and Name String Guarding Aug 26, 2019
kantai added 2 commits August 26, 2019 12:29
…ype to be queried (fixes a bug in the type checker for (legal) lists with notypes (e.g., (list none none)). add typemap to the contract analysis object: allows querying inferred types. used now just for tests, but will eventually have other uses as well.
}
}

guarded_string!(ClarityName, "ClarityName", Regex::new("^([a-zA-Z]|[-!?+<>=/*])([a-zA-Z0-9]|[-_!?+<>=/*])*$"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can ClarityNames start with punctuation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, yes -- things like the arithmetic operators are legal names (+, -, etc.).

However, we could change this so that the arithmetic operators are legal names, but nothing else that starts with punctuation is:

(^([a-zA-Z])([a-zA-Z0-9]|[-_!?+<>=/*])*)|^[-+=/*]$|^[<>][=]?$

Which I think is a good idea -- will change that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 74257cd

StandardPrincipal(StandardPrincipalData),
ContractPrincipal(ContractName),
QualifiedContractPrincipal { sender: StandardPrincipalData,
name: ContractName },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this deserves a top-level issue (it's already present in the versioning issue), but a qualified contract principal will also take a nonce. This would be done in a separate PR though; I'm only raising it to ask how you want the issue recorded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- I think we just log that in the versioning issue, and ensure that the PR implementing that issue makes that change.

Time("time"),
VrfSeed("vrf-seed"),
HeaderHash("header-hash"),
BurnchainHeaderHash("burnchain-header-hash"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need the block height and the parent header hash?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block-height is a native global, rather than an argument to the block-info function. It's defined in the variables.rs file.

// Aaron: at this point, we've _already_ allocated memory for this type.
// (e.g., from a (map...) call, or a (list...) call.
// this is a problem _if_ the static analyzer cannot already prevent
// this case. This applies to all the constructor size checks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mean that a high-dimensional list with one large dimension value could either (1) trigger OOM or (2) bill the contract creator with a high tx fee as if the list were a high-dimensional "cube", like before? We probably want to avoid both (1) and (2) if at all possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep -- the type checker will need to save us in this case, which it will be able to do (once we merge https://github.com/blockstack/blockstack-core/tree/feature/value-size-bounding), since two properties are true:

  1. TypeSignatures will Error rather than construct a too-large-value.
  2. Every node in the AST will be associated with a TypeSignature following a successful type_checker pass.

}

pub fn buff_from(buff_data: Vec<u8>) -> Result<Value> {
if buff_data.len() > u32::max_value() as usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check necessary, given that MAX_VALUE_SIZE is smaller than 4 billion bytes? Is there a material difference between BufferTooLarge and ValueTooLarge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope -- though in some indeterminate future when it MAX_VALUE_SIZE may be way too large, this check will be necessary. Also, no, there's no reason for both BufferTooLarge and ValueTooLarge to exist. BufferTooLarge is eliminated in the value-bounding branch. (For what it's worth, this code isn't really new in this PR -- it's just appearing as new because of breakings types.rs into mod.rs, signatures.rs, and serialization.rs)

}
}

pub fn size(&self) -> Result<i128> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this method fail? What does it mean for it to fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the u32 max value check -- type sizes should never result in arithmetic overflows, but that's a property of the MAX_VALUE_SIZE check, not the Rust types. Furthermore, in the value bounding branch, .size() is the only method that is allowed to be invoked during TypeSignature construction (meaning its the only function which may be called on a too-large type), and is used for the actual MAX_VALUE_SIZE enforcement, meaning that some code could conceivably result in an arithmetic overflow in size().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is a risky approach. I think it might be prudent to make the TypeSignature construction itself fail if the size would be too big (thereby making such type unrepresentable), and in so doing, deny the instantiation of Clarity code that could produce unrepresentable types. What are your thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what it will do -- type signature constructors are all guarded. However, rather than implement size computation twice, the constructors call size to check whether or not the size of the type is too large -- this means that the size function needs to be able to handle the case in which a type signature construction will fail. As for the size() invocation in Value, it should never fail, and this can be enforced with an expects().


let items: InterpreterResult<_> = entries
.drain(..)
.map(|value| Value::try_deserialize_parsed(value, entry_type.as_ref()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this method bound its recursion, i.e. as a safety/sanity check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so -- any recursion happening here has already needed to perform the same recursion during the construction of the value.

.unwrap_or_else(|_| "INVALID_C32_ADD".to_string());
write!(f, "'CT{}.{}", c32_str, name)
}
pub fn size(&self) -> Result<i128> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could structure lists so that the size calculation simply can't overflow? We'd need to make it so you can't construct a list that has the capacity for so many items that an overflow in this method is possible (which is probably desirable just for resource utilization purposes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep -- I think this will be more or less addressed in the value bounding branch. It will still perform checked arithmetic for reasons mentioned elsewhere, but in non-construction contexts, I think it can be made infallible.

Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great to me, but I did have one concern that I think we should talk about before merging it. That concern is making it impossible to construct Clarity objects whose size is so big that we can't express it in an i128. I think that at some point we're going to need to take a hard look at all the constructor code and make sure that Clarity programs that could instantiate larger-than-life objects are rejected by the network. This all could be the work of a future PR, but I just want some clarity on how we're going to approach this (no pun intended). Thoughts?

@kantai
Copy link
Member Author

kantai commented Sep 4, 2019

For the size checks, this is what is mentioned in the value bounding branch:

// 1. A TypeSignature constructor will always fail rather than construct a
// type signature for a too large or invalid type. This is why any variable length
// type signature has a guarded constructor.
// 2. The only methods which may be called on TypeSignatures that are too large
// (i.e., the only function that can be called by the constructor before
// it fails) is the .size() method, which may be used to check the size.

The rationale for (1) is that enforcing a size check using the rust type system would require rust to have support for dependent types. Enforcing through guarded constructors (and private instance variables) is, I think, the next best approach.

The rationale for (2) is that the size checks would otherwise need to be implemented twice: once for use in constructors, and second for use elsewhere. I can split this into a private function (fallible) and a public wrapper (infallible).

@jcnelson
Copy link
Member

Sounds good to me. As long as this is handled in another branch, then I'm good to go on this PR.

@jcnelson jcnelson self-requested a review September 10, 2019 17:52
@kantai kantai requested a review from lgalabru September 10, 2019 18:04
@kantai
Copy link
Member Author

kantai commented Sep 10, 2019

Merging! 🤘

@kantai kantai merged commit 6e7e17f into develop Sep 10, 2019
@kantai kantai deleted the feature/clarity-serialization branch January 27, 2021 23:13
@blockstack-devops
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Nov 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants