Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP: TVM update #88

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 142 additions & 0 deletions text/0088-tvm-update.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
- **TEP**: [88](https://github.com/ton-blockchain/TEPs/pull/87)
- **title**: TVM update
- **status**: Draft
- **type**: Core
- **authors**: [@SpyCheese](https://github.com/SpyCheese) [@EmelyanenkoK](https://github.com/EmelyanenkoK)
- **created**: 07.07.2022
- **replaces**: -
- **replaced by**: -

# Summary

This proposal suggests to
- add new op-codes to TVM related to block data retrieving, hashing, cryptographic signing and message sending
- include into `c7` register of TVM smart-contract' own code, additional data on transaction and information about masterchain blocks

# Motivation

Current TVM implementation makes it hard to implement some common thing. Proposal aims to make it easier and cheaper to:
- hash big chanks of data (including non-sha256 hashes)
- work with Bitcoin/Ethereum compatible signtatures
- prove TON onchain data
- calculate fee for sending messages

# Guide

Draft implementation is available [here](https://github.com/SpyCheese/ton/commits/new-opcodes).

This proposal suggests to
1. Extend **c7** tuple from 10 to 14 elements:
- **10**: code of the smart contract.
- **11**: value of the incoming message.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have it on stack when processing internal message, and it is ok, i don't think we need to move it to c7 at least because we have it only for internal messages, stack looks pretty fine. Less universal and not so useful imo.

- **12**: fees collected in the storage phase.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just adjust the balance by this fee?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here. Contract balance which can be retrieved by get_balance() (and stored in 7th element of c7) is already balance after storage fee deducted. Here we want to give contract an option to account storage fee if necessary.

- **13**: information about previous blocks.
2. Add the following TVM op-codes:
- **c7** primitives: `MYCODE`, `INCOMINGVALUE`, `STORAGEFEES`, `PREVBLOCKSINFOTUPLE`
- Block primitives: `PREVMCBLOCKS`, `PREVKEYBLOCK`
- Hash primitives: `HASHSTART`, `HASHEND`, `HASHENDST`, `HASHINFO`, `HASHAPPU`, `HASHAPPI`, `HASHAPPS`, `HASHAPPB`
- Cryptography primitives: `ECRECOVER`
- Message primitives: `SENDMSG`

3. Change sending fee calculation: deduct fee related to message estimation even in the case of failed sending.

# Specification

## c7 update
**10th** element of **c7** tuple contains cell with executed smart-contract code (from init_state of incoming message if applicable)

**11th** element of **c7** tuple contains value of incoming message. It is a tuple with two elements: incoming value (same value which is put on stack as 4th element) and a dictionary of extra currencies. If not applicable, the value is `[0, null]`.

**12th** element of **c7** tuple contains fee debited in storage phase.

**13th** elements of **c7** contains of tuple with two elements: `last_mc_blocks` and `prev_key_block`.
- `last_mc_blocks` is a tuple of up to 16 elements which contains previous most recent masterchain blocks info (ordered by seqno). Each block is represented as tuple of 5 integers: `(workchain, shard, seqno, root_hash, file_hash)`
- `prev_key_block` is a tuple of 5 integers `(workchain, shard, seqno, root_hash, file_hash)` corresponding to most recent key block.

## TVM update
### New stack type
This proposal suggests to introduce new type in TVM `Hasher`: it is opaque type used to consequently hash data chunks. This type behave the same way as others and can be considered immutable: `HASHAPP*` ops (described below) "consume" old hasher and return updated one (that way if old hasher was copied, it's copies are completely independent). This object is not serializable and thus can not be returned from TVM (for instance as result of get-method).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to use the name "Digest" as in popular languages like Java or GoLang, because this is a more abstract name for hashing/checksum calculation algorithms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like "digest" refers to result of hashing, here we use Hasher for intermediate object with some internal state.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this state could be implemented via a generic slice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, any state can be can be represented as some state. The question do we really need it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4304417

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UQAP7O1Pm6kHiSerD6L52UIXPYyeYId2ha5WsHWPwOCx_-3p

Copy link
Contributor

@xssnick xssnick Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see new Hasher object as a not a so good idea.

IMO ideology of TON is to keep everything as much universal as
possible, e.g everything is a cell, wallets = smart contracts, etc.

Maybe we can implement other hashes in the same way
as HASHCU/HASHSU works?

I don't actually think that it will be less efficient, I think in the most of use cases we will already have data cell/slice to hash as input.

Copy link
Contributor

@xssnick xssnick Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I have an alternative approach which will be same efficent as Hasher, but without introducing new type.

Lets imagine we have opcode HASH_SHA512 which will accept number of stack elements to hash as first stack element (s0).

So if we want to hash 1 uint256 and 1 slice, currently we will do something like:

HASHSTART_SHA512
PUSHINT 7777777777
HASHAPPU
PUSHSLICE {001010010101}
HASHAPPS
HASHEND

(stack ops are omitted)

But we can do it without hasher type, like:

PUSHSLICE {001010010101}
PUSHINT 7777777777
PUSHINT 2
HASH_SHA512

So we read first stack value, it is 2, it means that we need to read 2 more values and hash them, in reverse order.

This way i think it will be much more clear, and keep architecture cleaner and even faster :)

Copy link
Member Author

@EmelyanenkoK EmelyanenkoK Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which "universality" is broken here.
We are looking for the methods which can be used to hash large chunks of data, like 100kB. Approach with hasher objects allows to copy partially hashed data and cheaper get digest of data with the same prefix (not sure that we practically need it though).

Copy link
Contributor

@xssnick xssnick Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By universality I mean that we introducing new type which cannot be returned and serialized to cell.

Not sure too, where we can use the feature with copying hasher during single contract call, since it cannot be saved and reused later.

I could propose 3 options here:

  1. Do it without hasher object and without copy feature, HASH_SHA512 approach proposed 1 message before, looks more efficent and smaller (less stack ops needed).
  2. Add cell [de]serialization to hasher, then in could be saved, returned and accepted as input, this way prefix feature has more potential I think. And it is actually not a problem to serialize mid state of hasher and unpack later.
  3. Combine 2 approaches, add something like HASH_SHA512_STATE which will put cell with hasher state to stack. And add HASH_SHA512 which will have a flag in opcode, accept hasher or not, example:
PUSHSLICE {001010010101}
PUSHINT 7777777777
PUSHINT 2
HASH_SHA512_STATE  <- put cell with mid state of hasher to stack

PUSHSLICE {11110}
PUSHINT 1
PUSH s2 <- push hasher state
HASH_SHA512_END <- computes final hash which consists from 3 values, 2 of which was in state

This way it will be smaller and consume less stack ops i think. Maybe it can even accept tuple instead of args, then it will be 1 push, without push of len.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which "universality" is broken here.
We are looking for the methods which can be used to hash large chunks of data, like 100kB. Approach with hasher objects allows to copy partially hashed data and cheaper get digest of data with the same prefix (not sure that we practically need it though).

how you can hash 100kb data on a blockchain? It requires much more gas that possible.

Copy link
Member Author

@EmelyanenkoK EmelyanenkoK Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how you can hash 100kb data on a blockchain? It requires much more gas that possible

Currently yes, but it doesn't look like intended behavior: things which are cheap to nodes (and hashing is cheap) should not be prohibitively expensive for contracts.

### New opcodes
#### c7 and block ops
`MYCODE` - `0xF82A` - ` - c` - return cell with currently executed code. Equivalent to `10 GETPARAM`.

`INCOMINGVALUE` - `0xF82B` - ` - i` - return integer with TON amount in incoming message (or zero if not applicable). Equivalent to `11 GETPARAM`.

`STORAGEFEES` - `0xF82C` - ` - i` - return integer value of fee in nanoTONs debited in storage phase. Equivalent to `12 GETPARAM`.

`PREVBLOCKSINFOTUPLE` - `0xF82D` - ` - t` - return tuple with previous blocks info. Equivalent to `13 GETPARAM`.

`PREVMCBLOCKS`- `0xF83400` - ` - t` - return tuple with previous masterchain blocks info. Equivalent to `13 GETPARAM; FIRST`

`PREVKEYBLOCK`- `0xF83401` - ` - t` - return tuple with previous keyblock id. Equivalent to `13 GETPARAM; SECOND`

#### Hash primitives
`HASHSTART_SHA256` - `0xF90300` - ` - h` - create `sha256` hasher object.

`HASHSTART_SHA512` - `0xF90301` - ` - h` - create `sha512` hasher object.

`HASHSTART_BLAKE2B` - `0xF90302` - ` - h` - create `blake2b` hasher object.

`HASHSTART_KECCAK256` - `0xF90303` - ` - h` - create `keccak256` hasher object.

`HASHSTART_KECCAK512` - `0xF90304` - ` - h` - create `keccak512` hasher object.


`HASHEND` - `0xF904` - `h - ... ` - calculate hash from hasher and put in on the stack. If size of the hash does not exceed 256 bits, it is returned as a 256-bit unsigned integer (e.g. sha256). Otherwise it is returned as a tuple of 256-bit integers (e.g. sha512 - tuple of two integers). If the bit length of data in hasher is not divisible by eight, throws a cell underflow exception.

`HASHENDST` - `0xF905` - `b h - b' ` - calculate hash from hasher and store it into the slice.


`HASHINFO` - `0xF906` - `h - i` - return hash type of hasher (`sha256` - `0`, `sha512` - `1`, `blake2b` - `2`, `keccak256` - `3`, `keccak512` - `4`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that hardcoding magic numbers is always a bad practice. There are many algorithms with different names and for sure in the future the list of them will be replenished. If at this stage it is assumed that the ID for the hasher will be chosen purely for "historical reasons", then for many it will cause misunderstanding. I suggest using the same function instead of ordinal numbers as for determining the get-methods ID in smart contracts



`HASHAPPU` - `0xF907xx` - `h i - h'` - serialize unsigned integer in `xx+1` number of bits and put it into hasher.

`HASHAPPI` - `0xF907xx` - `h i - h'` - serialize signed integer in `xx+1` number of bits and put it into hasher.

`HASHAPPS` - `0xF909` - `h s - h'` - put data bits from slice to hasher.

`HASHAPPB` - `0xF909` - `h b - h'` - put data bits from builder to hasher.

#### Cryptography primitives

`ECRECOVER` - `0xF912` - `m v r s - i i i -1 or 0` - get secp256k1 signature as three integers `v`(8 bit), `r`(256 bit) and `s`(256 bit) and signed message hash `m` as 256 bit integer, decrypts the public key using elliptic curve DSA recovery mechanism and returns pubkey curve point (header-byte integer and two 256-bit integers for coordinates) and `-1` if signature is valid and `0` if not.

#### Message primitives
`SENDMSG` - `0xFB08` - `c m - i` - Sends a raw message contained in Cell `c` with mode `m` (the same way as `SENDRAWMSG`) and returns approximate action fee. The difference in mode interpreation with `SENDRAWMSG` is as following: `+1024` is used to calculate fee only (doesn't create output action), `+64` uses whole value of incoming value (since exact value can not be determined until the end of Computation phase), `+128` uses whole balance of account (since exact value can not be determined until the end of Computation phase).

### Change in message fee
It is suggested to deduct fee related to calculation of `fwd_fee` (cell graph traversal) in the case unsuccessfull sending. The fine of visiting cell in message that is not sent is proposed to be 1/4 of msg\_cell\_price.

## Update activation
It is expected that update will be activated as soon as at least 2/3 of validators will update and vote for version greater or equal to `4` in `GlobalVersion`: `ConfigParam 8`.

# Drawbacks

Downsides of this update is
- compication of TVM
- introducing of non-serializable TVM type

# Rationale and alternatives

Generally hash, cryptography and message primitives can be implemented without introducing of new opcodes. However it will be much more expensive and resource-intensive. In other words the same operation will needlessly require much more computation. Cheap and ready-to-go hash and cryptography operations will simplify cross-chain interactions with other networks.

Having contract code in c7 will eliminate necessity to store contract code explicitly in storage.

Having incoming message value in c7 will eliminate necessity to pass value from the stack (or manually save it into registers).

Having block info in c7 will allow to prove onchain data: for instance it is possible to prove existence of some transaction or account state or even get-method result at some point of time via merkle-proofs.

Fee deduction for unsent message eliminates unpaid work of message forward fee calculation.

# Prior art

Contract code was put into the same tuple index as in Everscale; while networks are not fully compatible (and compatibility is not pursued) some applications may still work in both.

# Unresolved questions

-

# Future possibilities

-