-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarity Value Serialization and Name String Guarding #1099
Conversation
…lize(), deserialize using specific JSON parsing struct
…still needs to be fully enforced.
…s with provided types (lists, tuples)
Codecov Report
@@ Coverage Diff @@
## develop #1099 +/- ##
==========================================
+ Coverage 80.16% 80.3% +0.14%
==========================================
Files 103 106 +3
Lines 23619 24769 +1150
==========================================
+ Hits 18933 19890 +957
- Misses 4686 4879 +193
Continue to review full report at Codecov.
|
src/vm/types/mod.rs
Outdated
"Invalid principal literal: expected a `.` in a qualified contract name".to_string()).into()); | ||
} | ||
let sender = Self::parse_standard_principal(split[0])?; | ||
let name = split[1].to_string(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we be sure that name
is an ASCII string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not as written -- but that's going to be addressed in #1078 (is my understanding), at which point we should have a simple guarded string type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm merging my guarded strings work into this PR, so this should be addressed now
src/vm/types/serialization.rs
Outdated
|
||
fn to_hex(val: &i128) -> String { | ||
if *val >= 0 { | ||
format!("{:x}", val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a paranoid delusion on my part, but how stable can we expect formatting rules to be in practice? I'm mainly concerned about whether or not the user's locale settings can affect the formatter's behavior (e.g. by making characters upper-case, by tacking on one or more leading 0's, by flipping the endianness, and so on).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share this fear, so it's at least a shared delusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent a while poking at the LowerHex docs, but produced no satisfactory specification, so I'll just update this to do the hexing manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for feeding my paranoia and general mistrust of GNU libc locale settings 👻
} | ||
|
||
|
||
#[cfg(test)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how reasonable or difficult this is, so feel free to dismiss.
Would it be desirable to add an "exhaustive" test that tries to serialize/deserialize all possible type combinations up to a certain (small) recursion depth? Might be the job of a fuzzer to do this; just something to think about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- the Value enum is fairly amenable to such a test.
…trings' into feature/clarity-serialization
…ype to be queried (fixes a bug in the type checker for (legal) lists with notypes (e.g., (list none none)). add typemap to the contract analysis object: allows querying inferred types. used now just for tests, but will eventually have other uses as well.
src/vm/representations.rs
Outdated
} | ||
} | ||
|
||
guarded_string!(ClarityName, "ClarityName", Regex::new("^([a-zA-Z]|[-!?+<>=/*])([a-zA-Z0-9]|[-_!?+<>=/*])*$")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can ClarityNames start with punctuation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, yes -- things like the arithmetic operators are legal names (+
, -
, etc.).
However, we could change this so that the arithmetic operators are legal names, but nothing else that starts with punctuation is:
(^([a-zA-Z])([a-zA-Z0-9]|[-_!?+<>=/*])*)|^[-+=/*]$|^[<>][=]?$
Which I think is a good idea -- will change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 74257cd
StandardPrincipal(StandardPrincipalData), | ||
ContractPrincipal(ContractName), | ||
QualifiedContractPrincipal { sender: StandardPrincipalData, | ||
name: ContractName }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this deserves a top-level issue (it's already present in the versioning issue), but a qualified contract principal will also take a nonce. This would be done in a separate PR though; I'm only raising it to ask how you want the issue recorded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- I think we just log that in the versioning issue, and ensure that the PR implementing that issue makes that change.
Time("time"), | ||
VrfSeed("vrf-seed"), | ||
HeaderHash("header-hash"), | ||
BurnchainHeaderHash("burnchain-header-hash"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need the block height and the parent header hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block-height
is a native global, rather than an argument to the block-info
function. It's defined in the variables.rs
file.
// Aaron: at this point, we've _already_ allocated memory for this type. | ||
// (e.g., from a (map...) call, or a (list...) call. | ||
// this is a problem _if_ the static analyzer cannot already prevent | ||
// this case. This applies to all the constructor size checks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this mean that a high-dimensional list with one large dimension value could either (1) trigger OOM or (2) bill the contract creator with a high tx fee as if the list were a high-dimensional "cube", like before? We probably want to avoid both (1) and (2) if at all possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep -- the type checker will need to save us in this case, which it will be able to do (once we merge https://github.com/blockstack/blockstack-core/tree/feature/value-size-bounding), since two properties are true:
- TypeSignatures will Error rather than construct a too-large-value.
- Every node in the AST will be associated with a TypeSignature following a successful type_checker pass.
} | ||
|
||
pub fn buff_from(buff_data: Vec<u8>) -> Result<Value> { | ||
if buff_data.len() > u32::max_value() as usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check necessary, given that MAX_VALUE_SIZE
is smaller than 4 billion bytes? Is there a material difference between BufferTooLarge
and ValueTooLarge
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope -- though in some indeterminate future when it MAX_VALUE_SIZE may be way too large, this check will be necessary. Also, no, there's no reason for both BufferTooLarge and ValueTooLarge to exist. BufferTooLarge is eliminated in the value-bounding branch. (For what it's worth, this code isn't really new in this PR -- it's just appearing as new because of breakings types.rs into mod.rs, signatures.rs, and serialization.rs)
} | ||
} | ||
|
||
pub fn size(&self) -> Result<i128> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this method fail? What does it mean for it to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the u32 max value check -- type sizes should never result in arithmetic overflows, but that's a property of the MAX_VALUE_SIZE check, not the Rust types. Furthermore, in the value bounding branch, .size()
is the only method that is allowed to be invoked during TypeSignature construction (meaning its the only function which may be called on a too-large type), and is used for the actual MAX_VALUE_SIZE enforcement, meaning that some code could conceivably result in an arithmetic overflow in size().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is a risky approach. I think it might be prudent to make the TypeSignature construction itself fail if the size would be too big (thereby making such type unrepresentable), and in so doing, deny the instantiation of Clarity code that could produce unrepresentable types. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what it will do -- type signature constructors are all guarded. However, rather than implement size computation twice, the constructors call size
to check whether or not the size of the type is too large -- this means that the size function needs to be able to handle the case in which a type signature construction will fail. As for the size() invocation in Value, it should never fail, and this can be enforced with an expects()
.
|
||
let items: InterpreterResult<_> = entries | ||
.drain(..) | ||
.map(|value| Value::try_deserialize_parsed(value, entry_type.as_ref())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this method bound its recursion, i.e. as a safety/sanity check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so -- any recursion happening here has already needed to perform the same recursion during the construction of the value.
.unwrap_or_else(|_| "INVALID_C32_ADD".to_string()); | ||
write!(f, "'CT{}.{}", c32_str, name) | ||
} | ||
pub fn size(&self) -> Result<i128> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could structure lists so that the size calculation simply can't overflow? We'd need to make it so you can't construct a list that has the capacity for so many items that an overflow in this method is possible (which is probably desirable just for resource utilization purposes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep -- I think this will be more or less addressed in the value bounding branch. It will still perform checked arithmetic for reasons mentioned elsewhere, but in non-construction contexts, I think it can be made infallible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks great to me, but I did have one concern that I think we should talk about before merging it. That concern is making it impossible to construct Clarity objects whose size is so big that we can't express it in an i128
. I think that at some point we're going to need to take a hard look at all the constructor code and make sure that Clarity programs that could instantiate larger-than-life objects are rejected by the network. This all could be the work of a future PR, but I just want some clarity on how we're going to approach this (no pun intended). Thoughts?
For the size checks, this is what is mentioned in the value bounding branch: // 1. A TypeSignature constructor will always fail rather than construct a The rationale for (1) is that enforcing a size check using the rust type system would require rust to have support for dependent types. Enforcing through guarded constructors (and private instance variables) is, I think, the next best approach. The rationale for (2) is that the size checks would otherwise need to be implemented twice: once for use in constructors, and second for use elsewhere. I can split this into a private function (fallible) and a public wrapper (infallible). |
Sounds good to me. As long as this is handled in another branch, then I'm good to go on this PR. |
Merging! 🤘 |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
This PR implements a standard serialization and deserialization scheme for Clarity
Value
s -- they are serialized to a subset of legal JSON strings. The deserializer uses Serde to parse the JSON, so the deserializer is more permissive than the serializer.types.rs
file into a folder module -- it had grown pretty unwieldy.Other objects will also need special serializers (anything that goes into the MARF should have a specified serialization), but that's going to be left for later work, since the wire format won't block on those.
This PR also adds guarding to the representation of names in Clarity (implementation of #1100) -- Contract names and general names (variables, functions, tuple fields) all now use one of two special types:
Which are guarded wrappers around
String
types. These guards ensure that any such names are valid Clarity names (a subset of ASCII).