Clarity Value Serialization and Name String Guarding #1099

kantai · 2019-08-22T17:33:20Z

This PR implements a standard serialization and deserialization scheme for Clarity Values -- they are serialized to a subset of legal JSON strings. The deserializer uses Serde to parse the JSON, so the deserializer is more permissive than the serializer.

Changes the deserialization interface of Clarity values to have two options:
- With an expected type: this is how the Clarity VM loads data out of the database: it has an expectation of the type (because of the type declarations of data maps, data vars, etc.)
- WIthout an expected type: this is how the wire format deserializer should call it.
Refactor of the types.rs file into a folder module -- it had grown pretty unwieldy.
Some refactoring of the TypeSignature and Value code to eliminate repetition, dead code, and some increased test coverage.

Other objects will also need special serializers (anything that goes into the MARF should have a specified serialization), but that's going to be left for later work, since the wire format won't block on those.

This PR also adds guarding to the representation of names in Clarity (implementation of #1100) -- Contract names and general names (variables, functions, tuple fields) all now use one of two special types:

ClarityName
ContractName

Which are guarded wrappers around String types. These guards ensure that any such names are valid Clarity names (a subset of ASCII).

…tion

…rialization

…lize(), deserialize using specific JSON parsing struct

…still needs to be fully enforced.

…s with provided types (lists, tuples)

… data loads

codecov · 2019-08-22T17:49:18Z

Codecov Report

Merging #1099 into develop will increase coverage by 0.14%.
The diff coverage is 84.6%.

@@            Coverage Diff             @@
##           develop   #1099      +/-   ##
==========================================
+ Coverage    80.16%   80.3%   +0.14%     
==========================================
  Files          103     106       +3     
  Lines        23619   24769    +1150     
==========================================
+ Hits         18933   19890     +957     
- Misses        4686    4879     +193

Impacted Files	Coverage Δ
src/vm/database/structures.rs	`100% <ø> (ø)`	⬆️
src/vm/analysis/type_checker/contexts.rs	`93.91% <100%> (+0.1%)`	⬆️
src/vm/tests/simple_apply_eval.rs	`96.42% <100%> (-0.05%)`	⬇️
src/vm/database/clarity_db.rs	`92.76% <100%> (+0.18%)`	⬆️
src/vm/analysis/type_checker/mod.rs	`88.67% <100%> (+0.04%)`	⬆️
src/vm/functions/define.rs	`98.07% <100%> (ø)`	⬆️
src/vm/analysis/type_checker/natives/maps.rs	`100% <100%> (ø)`	⬆️
src/vm/analysis/definition_sorter/mod.rs	`91.44% <100%> (ø)`	⬆️
src/vm/analysis/build_contract_interface/mod.rs	`100% <100%> (ø)`	⬆️
src/vm/functions/mod.rs	`96.66% <100%> (+0.02%)`	⬆️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bdb4e8a...1297e42. Read the comment docs.

jcnelson · 2019-08-22T18:00:47Z

src/vm/types/mod.rs

+                "Invalid principal literal: expected a `.` in a qualified contract name".to_string()).into());
+        }
+        let sender = Self::parse_standard_principal(split[0])?;
+        let name = split[1].to_string();


Can we be sure that name is an ASCII string?

Not as written -- but that's going to be addressed in #1078 (is my understanding), at which point we should have a simple guarded string type.

I'm merging my guarded strings work into this PR, so this should be addressed now

jcnelson · 2019-08-22T18:12:03Z

src/vm/types/serialization.rs

+
+fn to_hex(val: &i128) -> String {
+    if *val >= 0 {
+        format!("{:x}", val)


Maybe this is a paranoid delusion on my part, but how stable can we expect formatting rules to be in practice? I'm mainly concerned about whether or not the user's locale settings can affect the formatter's behavior (e.g. by making characters upper-case, by tacking on one or more leading 0's, by flipping the endianness, and so on).

I share this fear, so it's at least a shared delusion.

I spent a while poking at the LowerHex docs, but produced no satisfactory specification, so I'll just update this to do the hexing manually.

Thank you for feeding my paranoia and general mistrust of GNU libc locale settings 👻

src/vm/types/serialization.rs

jcnelson · 2019-08-22T20:01:01Z

src/vm/types/serialization.rs

+}
+
+
+#[cfg(test)]


I'm not sure how reasonable or difficult this is, so feel free to dismiss.

Would it be desirable to add an "exhaustive" test that tries to serialize/deserialize all possible type combinations up to a certain (small) recursion depth? Might be the job of a fuzzer to do this; just something to think about.

Yeah -- the Value enum is fairly amenable to such a test.

…trings' into feature/clarity-serialization

…ype to be queried (fixes a bug in the type checker for (legal) lists with notypes (e.g., (list none none)). add typemap to the contract analysis object: allows querying inferred types. used now just for tests, but will eventually have other uses as well.

jcnelson · 2019-09-04T16:24:07Z

src/vm/representations.rs

+    }
+}
+
+guarded_string!(ClarityName, "ClarityName", Regex::new("^([a-zA-Z]|[-!?+<>=/*])([a-zA-Z0-9]|[-_!?+<>=/*])*$"));


Can ClarityNames start with punctuation?

Currently, yes -- things like the arithmetic operators are legal names (+, -, etc.).

However, we could change this so that the arithmetic operators are legal names, but nothing else that starts with punctuation is:

(^([a-zA-Z])([a-zA-Z0-9]|[-_!?+<>=/*])*)|^[-+=/*]$|^[<>][=]?$

Which I think is a good idea -- will change that.

Updated in 74257cd

jcnelson · 2019-09-04T16:26:53Z

src/vm/types/mod.rs

+    StandardPrincipal(StandardPrincipalData),
+    ContractPrincipal(ContractName),
+    QualifiedContractPrincipal { sender: StandardPrincipalData,
+                                 name: ContractName },


Not sure if this deserves a top-level issue (it's already present in the versioning issue), but a qualified contract principal will also take a nonce. This would be done in a separate PR though; I'm only raising it to ask how you want the issue recorded.

Yeah -- I think we just log that in the versioning issue, and ensure that the PR implementing that issue makes that change.

jcnelson · 2019-09-04T16:27:49Z

src/vm/types/mod.rs

+    Time("time"),
+    VrfSeed("vrf-seed"),
+    HeaderHash("header-hash"),
+    BurnchainHeaderHash("burnchain-header-hash"),


Don't we also need the block height and the parent header hash?

block-height is a native global, rather than an argument to the block-info function. It's defined in the variables.rs file.

jcnelson · 2019-09-04T16:29:46Z

src/vm/types/mod.rs

+        // Aaron: at this point, we've _already_ allocated memory for this type.
+        //     (e.g., from a (map...) call, or a (list...) call.
+        //     this is a problem _if_ the static analyzer cannot already prevent
+        //     this case. This applies to all the constructor size checks.


Would this mean that a high-dimensional list with one large dimension value could either (1) trigger OOM or (2) bill the contract creator with a high tx fee as if the list were a high-dimensional "cube", like before? We probably want to avoid both (1) and (2) if at all possible.

Yep -- the type checker will need to save us in this case, which it will be able to do (once we merge https://github.com/blockstack/blockstack-core/tree/feature/value-size-bounding), since two properties are true:

TypeSignatures will Error rather than construct a too-large-value.

Every node in the AST will be associated with a TypeSignature following a successful type_checker pass.

jcnelson · 2019-09-04T18:59:53Z

src/vm/types/mod.rs

+    }
+
+    pub fn buff_from(buff_data: Vec<u8>) -> Result<Value> {
+        if buff_data.len() > u32::max_value() as usize {


Is this check necessary, given that MAX_VALUE_SIZE is smaller than 4 billion bytes? Is there a material difference between BufferTooLarge and ValueTooLarge?

Nope -- though in some indeterminate future when it MAX_VALUE_SIZE may be way too large, this check will be necessary. Also, no, there's no reason for both BufferTooLarge and ValueTooLarge to exist. BufferTooLarge is eliminated in the value-bounding branch. (For what it's worth, this code isn't really new in this PR -- it's just appearing as new because of breakings types.rs into mod.rs, signatures.rs, and serialization.rs)

jcnelson · 2019-09-04T19:03:18Z

src/vm/types/mod.rs

+        }
+    }
+
+    pub fn size(&self) -> Result<i128> {


How can this method fail? What does it mean for it to fail?

Similar to the u32 max value check -- type sizes should never result in arithmetic overflows, but that's a property of the MAX_VALUE_SIZE check, not the Rust types. Furthermore, in the value bounding branch, .size() is the only method that is allowed to be invoked during TypeSignature construction (meaning its the only function which may be called on a too-large type), and is used for the actual MAX_VALUE_SIZE enforcement, meaning that some code could conceivably result in an arithmetic overflow in size().

I feel like this is a risky approach. I think it might be prudent to make the TypeSignature construction itself fail if the size would be too big (thereby making such type unrepresentable), and in so doing, deny the instantiation of Clarity code that could produce unrepresentable types. What are your thoughts?

That's exactly what it will do -- type signature constructors are all guarded. However, rather than implement size computation twice, the constructors call size to check whether or not the size of the type is too large -- this means that the size function needs to be able to handle the case in which a type signature construction will fail. As for the size() invocation in Value, it should never fail, and this can be enforced with an expects().

src/vm/types/mod.rs

jcnelson · 2019-09-04T19:05:42Z

src/vm/types/serialization.rs

+
+                let items: InterpreterResult<_> = entries
+                    .drain(..)
+                    .map(|value| Value::try_deserialize_parsed(value, entry_type.as_ref()))


Should this method bound its recursion, i.e. as a safety/sanity check?

I don't think so -- any recursion happening here has already needed to perform the same recursion during the construction of the value.

jcnelson · 2019-09-04T19:10:46Z

src/vm/types/signatures.rs

-                    .unwrap_or_else(|_| "INVALID_C32_ADD".to_string());
-                write!(f, "'CT{}.{}", c32_str, name)
-            }
+    pub fn size(&self) -> Result<i128> {


Do you think we could structure lists so that the size calculation simply can't overflow? We'd need to make it so you can't construct a list that has the capacity for so many items that an overflow in this method is possible (which is probably desirable just for resource utilization purposes).

Yep -- I think this will be more or less addressed in the value bounding branch. It will still perform checked arithmetic for reasons mentioned elsewhere, but in non-construction contexts, I think it can be made infallible.

jcnelson

This all looks great to me, but I did have one concern that I think we should talk about before merging it. That concern is making it impossible to construct Clarity objects whose size is so big that we can't express it in an i128. I think that at some point we're going to need to take a hard look at all the constructor code and make sure that Clarity programs that could instantiate larger-than-life objects are rejected by the network. This all could be the work of a future PR, but I just want some clarity on how we're going to approach this (no pun intended). Thoughts?

kantai · 2019-09-04T19:44:44Z

For the size checks, this is what is mentioned in the value bounding branch:

// 1. A TypeSignature constructor will always fail rather than construct a
// type signature for a too large or invalid type. This is why any variable length
// type signature has a guarded constructor.
// 2. The only methods which may be called on TypeSignatures that are too large
// (i.e., the only function that can be called by the constructor before
// it fails) is the .size() method, which may be used to check the size.

The rationale for (1) is that enforcing a size check using the rust type system would require rust to have support for dependent types. Enforcing through guarded constructors (and private instance variables) is, I think, the next best approach.

The rationale for (2) is that the size checks would otherwise need to be implemented twice: once for use in constructors, and second for use elsewhere. I can split this into a private function (fallible) and a public wrapper (infallible).

jcnelson · 2019-09-10T17:52:43Z

Sounds good to me. As long as this is handled in another branch, then I'm good to go on this PR.

src/vm/functions/database.rs

src/vm/parser.rs

src/vm/representations.rs

kantai · 2019-09-10T20:42:01Z

Merging! 🤘

blockstack-devops · 2024-11-25T00:22:13Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

kantai added 11 commits August 19, 2019 13:05

types.rs -> types/mod.rs

12f0a8a

Merge branch 'feature/general-cleanup' into feature/clarity-serializa…

f5f5961

…tion

Merge remote-tracking branch 'origin/develop' into feature/clarity-se…

53a7bec

…rialization

serializationclarity object serialization using our own impl of seria…

0ac9a33

…lize(), deserialize using specific JSON parsing struct

use expected types from the clarity_db to deserialize lists, tuples. …

2cc76f3

…still needs to be fully enforced.

refactor list typesignature, do type enforcement on value constructor…

218be3f

…s with provided types (lists, tuples)

simple test for making sure get_value is setting max_len correctly on…

c5c2169

… data loads

serialization/deserialization code tests

ca1045b

support deserializing with and without an expected_type.

0e2a5f3

some light refactor, dead code removal from types module

9d7425b

more dead code removal, add test for contract-caller behavior

83db5da

jcnelson reviewed Aug 22, 2019

View reviewed changes

src/vm/types/serialization.rs Show resolved Hide resolved

jcnelson reviewed Aug 22, 2019

View reviewed changes

src/vm/types/serialization.rs Show resolved Hide resolved

use our own hex serialization

3eaa70f

kantai self-assigned this Aug 22, 2019

jcnelson reviewed Aug 22, 2019

View reviewed changes

kantai added 5 commits August 23, 2019 10:46

first steps toward guarded strings

0d1e6a4

use guarded contractname type to represent contract names.

22dc81b

use ClarityName in function args, ContractAnalysis

21bc87a

a bit more restrictive regexes on the guarded strings

32d50fe

Added guarded strings to this branch. Merge branch 'feature/guarded-s…

36fdfc5

…trings' into feature/clarity-serialization

kantai changed the title ~~Clarity Value Serialization~~ Clarity Value Serialization and Name String Guarding Aug 26, 2019

kantai added 2 commits August 26, 2019 12:29

more dead code removal

5742328

jcnelson reviewed Sep 4, 2019

View reviewed changes

src/vm/types/mod.rs Show resolved Hide resolved

jcnelson reviewed Sep 4, 2019

View reviewed changes

jcnelson requested changes Sep 4, 2019

View reviewed changes

only allow punction to start a name in native functions

74257cd

jcnelson self-requested a review September 10, 2019 17:52

jcnelson approved these changes Sep 10, 2019

View reviewed changes

kantai requested a review from lgalabru September 10, 2019 18:04

lgalabru reviewed Sep 10, 2019

View reviewed changes

src/vm/functions/database.rs Outdated Show resolved Hide resolved

src/vm/parser.rs Show resolved Hide resolved

src/vm/representations.rs Show resolved Hide resolved

use single & refs instead of double refs

1297e42

lgalabru approved these changes Sep 10, 2019

View reviewed changes

kantai merged commit 6e7e17f into develop Sep 10, 2019

This was referenced Sep 11, 2019

Clarity String instances should be guarded #1100

Closed

Decide on data serialization formats for smart contract data #946

Closed

kantai deleted the feature/clarity-serialization branch January 27, 2021 23:13

blockstack-devops added the locked label Nov 25, 2024

stacks-network locked as resolved and limited conversation to collaborators Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarity Value Serialization and Name String Guarding #1099

Clarity Value Serialization and Name String Guarding #1099

kantai commented Aug 22, 2019 •

edited

Loading

codecov bot commented Aug 22, 2019 •

edited

Loading

jcnelson Aug 22, 2019

kantai Aug 22, 2019

kantai Aug 26, 2019

jcnelson Aug 22, 2019

kantai Aug 22, 2019

kantai Aug 22, 2019

jcnelson Aug 22, 2019

jcnelson Aug 22, 2019

kantai Aug 22, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

kantai Sep 5, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson Sep 4, 2019

kantai Sep 4, 2019

jcnelson left a comment

kantai commented Sep 4, 2019

jcnelson commented Sep 10, 2019

kantai commented Sep 10, 2019

blockstack-devops commented Nov 25, 2024

		}


		#[cfg(test)]

Clarity Value Serialization and Name String Guarding #1099

Clarity Value Serialization and Name String Guarding #1099

Conversation

kantai commented Aug 22, 2019 • edited Loading

codecov bot commented Aug 22, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcnelson left a comment

Choose a reason for hiding this comment

kantai commented Sep 4, 2019

jcnelson commented Sep 10, 2019

kantai commented Sep 10, 2019

blockstack-devops commented Nov 25, 2024

kantai commented Aug 22, 2019 •

edited

Loading

codecov bot commented Aug 22, 2019 •

edited

Loading