-
Notifications
You must be signed in to change notification settings - Fork 520
/
Copy pathPaper.tex
812 lines (582 loc) · 40.4 KB
/
Paper.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
\documentclass[9pt,oneside]{amsart}
%\usepackage{tweaklist}
\usepackage{url}
\usepackage{xspace}
\usepackage{graphicx}
\usepackage{multicol}
\usepackage{subfig}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage[a4paper,width=170mm,top=18mm,bottom=22mm,includeheadfoot]{geometry}
\usepackage{booktabs}
\usepackage{array}
\usepackage{verbatim}
\usepackage{caption}
\usepackage{natbib}
\usepackage{float}
\usepackage{pdflscape}
\newcommand*\eg{e.g.\@\xspace}
\newcommand*\Eg{e.g.\@\xspace}
\newcommand*\ie{i.e.\@\xspace}
%\renewcommand{\itemhook}{\setlength{\topsep}{0pt} \setlength{\itemsep}{0pt}\setlength{\leftmargin}{15pt}}
\title{Ethereum: A Secure Decentralised Compute Platform}
\author{Gavin Wood, Vitalik Buterin}
\begin{document}
\begin{abstract}
The block-chain paradigm when coupled with cryptographically-secured transactions has demonstrated its utility through a number of projects, not least Bitcoin. Each such project can be seen as a simple application on a decentralised, but singleton, compute resource. We can call this paradigm a transactional singleton machine with shared-state.
Ethereum implements this paradigm in a generalised manner. Furthermore it provides a plurality of such resources, each with a distinct state and operating code but able to interact through a message-passing framework with others. We discuss its design, implementation issues, the opportunities it provides and the future hurdles we envisage.
\end{abstract}
\maketitle
\setlength{\columnsep}{20pt}
\begin{multicols}{2}
\section{Introduction}\label{sec:introduction}
With ubiquitous internet connections in most places of the world, global information transmission has become incredibly cheap. Technological-rooted movements like Bitcoin have demonstrated through the power of the default, consensus mechanisms and voluntary respect of the social contract that it is possible to use the internet to make a decentralised value-transfer system, shared across the world and virtually free to use. This system can be said to be a very specialised version of a cryptographically secure, transaction-based state machine. Follow-up systems such as namecoin adapted this original ``currency application'' of the technology into other applications allbeit rather simplistic ones.
Ethereum is a project that attempts to build the generalised technology; technology on which all transaction-based state machine concepts may be built. Moreover it aims to provide to the end-developer a tightly integrated end-to-end system for building software on a hitherto unexplored compute paradigm in the mainstream: a trustful object messaging compute framework.
\subsection{Driving Factors} \label{ch:driving}
Transparency. Open state. User can be guaranteed that backend will do what front-end says it will. Parties can analyse and be wholly confident in outcomes given events.
\subsection{Previous Work} \label{ch:previous}
E language. Smart contracts. Bitcoin, namecoin \&c. Mastercoin.
\section{Proposal} \label{ch:proposal}
\subsection{Overview} \label{ch:overview}
Ethereum, taken as a whole, can be viewed as a transaction-based state machine: we begin with a genesis state and incrementally execute transactions to morph it into some final state. Mathematically:
\begin{equation}
S_{t+1} := f(S_t, T)
\end{equation}
and $f$ is the Ethereum state transition function. In Ethereum, $f$, together with $S$ are considerably more powerful then any existing comparable system; $f$ allows components to carry out arbitrary computation to be done while $S$ allows components to store arbitrary state between transactions.
%where
%\begin{eqnarray}
%S_{t+1} := F(S_t, \mathbb{T}) \\
%\mathbb{T} = [ T_0, T_1, ... ] \\
%F(S, \mathbb{T}) := g(f(f(S, T_0), T_1) ...)
%\end{eqnarray}
Transactions are collated into blocks; we then chain blocks together using a cryptographic hash as a means of reference. Blocks function as a journal, recording a series of transactions together with the previous block and an identifier for the final state (though do not store the final state itself---that would be far too big).
Since the system is decentralised and all parties have an opportunity to create a new block on some older pre-existing block, there must be a method of selecting a single block-chain from the block-tree that results. If there is ever a disagreement between parties as to which root-to-leaf path down the block-tree is the `best', then a \textit{fork} occurs and there is a co-existance of multiple states of the system; this is to be avoided at all costs. In order to generate consensus we specify an algorithm to determine the canonical `best' block-chain given the block-tree; we use a simplified version of the GHOST algorithm \cite{ghost}.
\subsection{Blocks, State and Transactions} \label{ch:bst}
\subsubsection{World State} \label{ch:state}
The world-state (\textit{state}, see Appendix \ref{app:state}), is a mapping between addresses (160-bit identifiers) and account states (a data structure serialised as RLP). Though not stored on the block chain, it is assumed that the implementation will maintain this mapping in a modified Merkle Patricia tree (\textit{trie}, see Appendix \ref{app:trie}). The trie requires a simple database back-end that maintains a mapping of bytearrays to bytearrays; we name this underlaying database the state database. This has a number of benefits; firstly the root node of this structure is cryptographically dependent on all internal data and as such its hash can be used as a secure identity for the entire system state. Secondly, being an immutable data structure, it allows any previous state (whose root hash is known) to be recalled by simply altering the root hash accordingly. Since we store all such root hashes in the block chain, we are able to trivially revert to old states.
The account state comprises the first two, and potentially the last two, of the following fields:
\begin{description}
\item[nonce] A scalar value equal to the number of transactions sent from this address, or, in the case of contract accounts, the number of contract-creations made by this account.
\item[balance] A scalar value equal to the number of wei owned by this address.
\item[stateRoot] A 256-bit hash equal to the root node of a further trie structure that encodes the storage contents of the contract. Though a separate data structure, this trie is still stored using the same underlying state database. This trie takes the form as a simple mapping between domains of 256-bit values.
\item[codeHash] The hash of the EVM code of this contract---this is the code that gets executed should this address reveive a call; it is immutable and thus, unlike all other fields, cannot be changed after construction. All such code fragments are contained in the state database under their corresponding hashes for later retrieval.
\end{description}
If the latter two fields are missing, the node represents a simple (non-contract) account: it maintains no state and executes no code on receipt of a transaction.
\subsubsection{Transaction} \label{ch:transaction}
A transaction is a single cryptographically signed instruction sent by an actor external to Ethereum. e.g. This might be from a person (via a mobile device or desktop computer) or could be from a piece of automated software running on a server. There are two types of transactions: those which result in message calls and those which result in the creation of new contracts. Both types specify a number of common fields:
\begin{description}
\item[nonce] A scalar value equal to the number of transactions sent by the sender.
\item[value] A scalar value equal to the number of wei to be transferred to the message call's recipient or, in the case of contract creation, as an endowment to the newly created contract's account.
\item[gasPrice] A scalar value equal to the number of wei to be paid per unit of gas for all computation costs incurred as a result of the execution of this transaction.
\item[gasLimit] A scalar value equal to the maximum amount of gas that should be used in executing this transaction. This is paid up-front, before any computation is done and may not be topped-up later.
\item[to] The 160-bit address of the message call's recipient or the zero address for a contract-creation transaction.
\item[v, r, s] Three scalar values corresponding to the signature of the transaction and used to determine the sender of the transaction. See Appendix \ref{app:signing}.
\end{description}
Additionally, a contract-creation transaction contains:
\begin{description}
\item[body] An unlimited size byte array specifying the EVM-code for the contract.
\item[init] An unlimited size byte array specifying the EVM-code for the contract initialisation procedure.
\end{description}
Both \textbf{body} and \textbf{init} are EVM-code fragments; \textbf{body} is executed each time the contract account receives a message call (either through a transaction or due to the internal execution of code). \textbf{init} is a piece of code executed only once and at contract creation. It gets discarded immediately.
Whereas a message call transaction contains:
\begin{description}
\item[data] An unlimited size byte array specifying the input data of the message call.
\end{description}
\subsubsection{Block} \label{ch:block}
The block in Ethereum is the collection of relevant pieces of information (known as the block \textit{header}) together with a set of transactions and a set of other block headers that are known to have a parent equal to the present block's parent's parent (such blocks are known as \textit{uncles}). The block header contains several pieces of information:
\begin{description}
\item[parentHash] The SHA3 256-bit hash of the parent block, in its entirety.
\item[unclesHash] The SHA3 256-bit hash of the uncles list portion of this block.
\item[coinbase] The 160-bit address to which all fees collected from the successful mining of this block be transferred.
\item[stateRoot] The SHA3 256-bit hash of the root node of the state trie, after all transactions are executed and finalisations applied.
\item[transactionsHash] The SHA3 256-bit hash of the transactions list portion of this block.
\item[difficulty] A scalar value corresonding to the difficulty level of this block. This can be calculated from the previous block's difficulty level and the timestamp.
\item[timestamp] A scalar value equal to the reasonable output of Unix's time() at this block's inception.
\item[extraData] An arbitrary byte array containing data relavant to this block. Generally 256-bit or less, but could be much larger in the genesis block.
\item[nonce] A 256-bit hash which proves that a sufficient amount of computation has been carried out on this block.
\end{description}
The difficulty, $d$, is defined as:
\begin{equation}
d' := \begin{cases}
2^{22} & \quad \text{if genesis block}\\
d' + \frac{d'}{1024} & \quad \text{if $t' < t + 42$}\\
d' - \frac{d'}{1024} & \quad \text{otherwise}\\
\end{cases}
\end{equation}
Where $d'$ is the previous block's difficulty, $t'$ is the prvious block's timestamp and $t$ is the current block's timestamp.
The nonce, $n$, must satisfy:
\begin{equation}
\mathcal{P}(n) < \frac{d}{2^{256}}
\end{equation}
Where $\mathcal{P}$ is the proof-of-work function (see section \ref{ch:pow}). The other two parts to a block are simply a list of the transactions (as defined previously) and a list of uncle headers (of the same format as above).
\subsection{Gas and Payment} \label{ch:payment}
In order to avoid issues of network abuse and to sidestep the inevitable questions stemming from Turing completeness, all programmable computation in Ethereum is subject to fees. The fee schedule is specified in units of \textit{gas} (see Appendix \ref{app:fees} for the fees associated with various computation). Thus any given fragment of programmable computation (this includes creating contracts, making message calls, utilising and accessing contract storage and executing operations on the virtual machine) has a universally agreed cost in terms of gas.
Every transaction has a specific amount of gas associated with it: \textbf{gasLimit}. This is the amount of gas which is implicitly purchased from the sender's account balance. The purchase happens at the according \textbf{gasPrice}, also specified in the transaction. The transaction is considered invalid if the account balance cannot support such a purchase. It is named \textbf{gasLimit} since any unused gas at the end of the transaction is refunded (at the same rate of purchase) to the sender account. Gas does not exist outside of the execution of a transaction. Thus for trusted contracts, a relatively high gas limit may be set and left alone.
Any ether paid for gas by the sender that is not refunded is delivered to the miner. Transactors are free to specify any \textbf{gasPrice} that they wish, however miners are free to ignore transactions as they choose. A higher gas price on a transaction will therefore cost the sender more in terms of ether and deliver a greater value to the miner and thus will more likely be selected for inclusion by more miners. Miners, in general, will choose to advertise the minimum gas price for which they will execute transactions and transactors will be free to canvas these prices in determining what gas price to offer. Since there will be a (weighted) distribution of minimum acceptable gas prices, transactors will necessarily have a trade-off to make between lowering the gas price and maximising the chance that their transaction will be mined in a timely manner.
%\subsubsection{Determining Computation Costs}
\subsection{Transaction Execution} \label{ch:execution}
The execution of a transaction is the most complex part of the Ethereum protocol: it defines the state transition function $f$ as specified before. It is assumed that any transactions executed first pass the initial tests of intrinsic validity. These include:
\begin{enumerate}
\item The transaction signature is valid;
\item the transaction nonce is valid (equivalent to the sender account's current nonce);
\item the gas limit is no smaller that the intrinsic gas, $i$, used by the transaction;
\item the sender account balance contains at least the cost, $c$, (in wei) required in up-front payment;
\end{enumerate}
where:
\begin{eqnarray}
i \equiv & \begin{cases}
|\mathbf{d}|.G_{data} + G_{call} & \quad \text{call} \\
(|\mathbf{i}| + |\mathbf{b}|).G_{data} + G_{create} & \quad \text{create} \\
\end{cases} \\
c \equiv & l.p + v
\end{eqnarray}
where $|\mathbf{d}|$, $|\mathbf{i}|$ and $|\mathbf{b}|$ are the sizes, in bytes, of the transaction's associated data, initialisation \& body EVM-code and $G$ is defined in Appendix \ref{app:fees}.
The execution of a valid transaction begins with an irrevocable change made to the state: the sender's nonce is incremented by one and the balance is reduced by the cost, $c$. The gas available for the proceeding computation, $g$, is defined as $l - i$. The proceeding computation, whether contract creation or a message call, results in an eventual state (which may legally be equivalent to the current state), the change to which is deterministic and never invalid: there can be no invalid transactions from this point.
After the message call or contract creation is processed, the state is finalised by refunding $g'$, the remaining gas, to the sender at the original rate, thus we increase the sender's balance by $g.p$.
\subsubsection{Create} \label{ch:create}
There are number of intrinsic parameters used when creating a contract: sender ($s$), nonce ($o$), available gas ($g$), endowment ($v$) together with the two arbitrary length byte arrays, $\mathbf{i}$ for the initialisation EVM code and $\mathbf{b}$ for the EVM code of the body.
The address of the new contract is defined as being the leftmost 160 bits of the SHA3 hash of RLP encoding of the structure \texttt{[ sender, nonce ]}. In in the unlikely event that the addresses is already in use, it is treated as a big-endian integer and incremented by one until an unused address is arrived at, thus:
\begin{eqnarray}
a' := \mathcal{L}_{160}(\mathcal{S}(\mathcal{R}([s, o])))\\
a := \arg \min_x : x \geq a' \wedge \mathbb{S}[x] = \emptyset
\end{eqnarray}
where $\mathcal{S}$ is the SHA3 256-bit hash function, $\mathcal{R}$ is the RLP encoding function, $\mathcal{L}_y(X)$ evaluates to the leftmost $y$ bits of the binary data $X$ and $\mathbb{S}[x]$ is the address state of $x$ or $\emptyset$ if none exists.
The account's nonce is initially defined as zero, the balance as the value passed, the storage as empty and the code hash as the SHA3 256-bit hash of the code, thus the new state becomes $\mathbb{S}'$:
\begin{equation}
\mathbb{S}'[r] := ( 0, v, \empty, \mathcal{S}(\mathbf{b}) )
\end{equation}
The state database is altered to include the pair $(\mathcal{S}(\mathbf{b}), \mathbf{b})$. Finally, the contract is initialised through the execution of the initialising EVM code $\mathbf{i}$ according to the execution model (see section \ref{ch:model}). Code execution can effect several events that are not internal to the execution state: the contract's storage can be altered, further contracts can be created and further message calls can be made.
Code execution depletes gas; thus it may exit before the code has come to a natural halting state. In this exceptional case we say an Out-of-Gas exception has occured and the state database is reverted to the point immediately prior to the account initialisation. All available gas is used and no gas is refunded to the originator. If such an exception does not occur, then the remaining gas is refunded to the originator and the now-altered state is allowed to persevere.
\subsubsection{Message Call} \label{ch:call}
In the case of executing a message call, several parameters are required: sender ($s$), recipient ($r$), available gas ($g$), value ($v$) together with an arbitrary length byte array, $\mathbf{d}$, the input data of the call. Message calls also have intrinsic return data; the call's output byte array $\mathbf{r}$. This is ignored when executing transactions, however message calls can be initiated due to VM-code execution and in this case the return data is used.
In executing a message call, the account balance of the recipient is increased by the value passed, thus $\mathbb{S}$ becomes replaced with $\mathbb{S}'$:
\begin{equation}
\mathbb{S}'[r]_{balance} := \mathbb{S}[r]_{balance} + v
\end{equation}
In the case that the recipient contains code to be executed (i.e. that the account is a contract), then the contract's body code, to be found in the state database with key $\mathbb{S}[r]_{code}$, is executed according to the execution model (see section \ref{ch:model}). Just as with contract creation, if the execution halts due to an exhausted gas supply, then no gas is refunded to the caller and the state is reverted to the point immediately prior to code execution. Unlike with contract creation,
\subsection{Execution Model} \label{ch:model}
The original caller (i.e. the original transaction's txsender) pre-pays for a specific amount of 'GAS'. What was baseFee now describes the ETH -> GAS price. Gas is the internal currency of the transaction. Execution operations deplete the GAS supply, as per the original ratios. Once the transaction is finished, any remaining GAS is reconverted back to ETH and refunded.
Storage remains 256-bit address, 256-bit data.
Word size: 256-bit (i.e. stack item size 256-bit).
Memory is byte array (256-bit address, 8-bit data); memoryFee is proportional to greatest index that has ever been referenced in this instance and thus it's highly unlikely addresses will ever go above 32-bit. Whenever a higher memory index is used, the fee difference to take it to the higher usage from the original (lower) usage is charged. MEMSIZE is initially just the size of the input data. Note: MSTORE and MLOAD increase the highest-accessed index to their target index + 31.
Given an instruction, it is possible to calculate the gas cost of executing it as follows:
\begin{itemize}
\item Unless covered by another rule below, an operation costs < STEPGAS > gas
\item SHA3 costs $G_{sha3}$ gas
\item SLOAD costs $G_{sload}$ gas
\item BALANCE costs $G_{balance}$ gas
\item SSTORE costs $d.G{sstore}$ gas where:
\begin{itemize}
\item $d = 2$ if the new value of the storage is non-zero and the old is zero;
\item $d = 0$ if the new value of the storage is zero and the old is non-zero;
\item $d = 1$ otherwise.
\end{itemize}
\item CALL costs $G_{call} + g$, where $g$ is the quantity of gas provided; some gas may later be refunded.
\item CREATE costs $G_{create} + g$ gas, where $g$ is the quantity of gas provided.
\item When you read or write memory with MSTORE, MLOAD, RETURN, SHA3, CALLDATA or CALL, enlarge the memory so that all bytes now fit in it (memory must always be a whole number of 32-byte words). Suppose that the highest previously accessed memory index is $m$, and the new index is $n$. If $(n + 31) / 32 > (m + 31) / 32$, add an additional $((n + 31) / 32 - (m + 31) / 32) * G_{memory}$ gas to the cost.
\end{itemize}
\begin{enumerate}
\item Let $\mathbb{S}' := \mathbb{S}$, the current state.
\item Let $PC := 0$, the program counter.
Let it be known that all storage operations operate on TO's state.
Note that the memory is empty, thus (last used index + 1) = 0.
\item Set TXDATA to the first INSIZE bytes of memory starting from INOFFSET in the caller memory.
\item Repeat
\begin{itemize}
\item Calculate the cost of the current instruction (see below), set to $C$.
\item If the instruction is invalid or STOP, goto step 6.
\item If $GAS < C$ then $GAS := 0$; $S := ROLLBACK$ and evaluation finishes, returning false.
\item $GAS := GAS - C$
\item Apply the instruction.
\end{itemize}
Until an operation execution error has occured or the instruction is STOP, RETURN or SUICIDE.
\item If the output data of the call (R) is specified (through RETURN), then let the OUTSIZE bytes of caller memory beginning at OUTOFFSET.
\item Returns true.
\end{enumerate}
When a contract address receives a transaction, a virtual machine is initiated with the contract's state.
\subsubsection{Terms}
There exists a stack of variable size that stores 256-bit values at each location. (Note: most instructions operate solely on the stack altering its state in some way.)
T'[i] is the ith item counting down from the top of the pre-stack (i.e. the stack immediately after instruction execution), with the top of the stack corresponding to i == 0.
T[i] is the ith item counting down from the top of the post-stack (i.e. the stack immediately prior to instruction execution), with the top of the stack corresponding to i == 0.
The exists a permanent store addressable by a 256-bit value that stores 256-bit values at each location.
S'[i] is the permanent store (sometimes refered to as 'state' or 'store') of the VM at index i counting from zero PRIOR to instruction execution.
S[i] is the permanent store (sometimes refered to as 'state' or 'store') of the VM at index i counting from zero AFTER to instruction execution.
The exists a temporary store addressable by a 256-bit value that stores 256-bit values at each location.
M'[i] is the temporary store (sometimes refered to as 'memory') of the VM at index i counting from zero PRIOR to instruction execution.
M[i] is the temporary store (sometimes refered to as 'memory') of the VM at index i counting from zero AFTER instruction execution.
C is the code, a byte sequence.
PC' is the program counter PRIOR to instruction execution.
PC is the program counter AFTER instruction execution.
FEE(I, S', P', D) is the fee associated with the execution of instruction I with a machine of stack S', permanent store P' and which has already completed D operations.
It is defined as F where:
TODO: No it's not!
IF I == SSTORE AND P[ S'[0] ] != 0 AND S'[1] == 0 THEN
F = S + dataFee - memoryFee
IF I == SSTORE AND P[ S'[0] ] == 0 AND S'[1] != 0 THEN
F = S + dataFee + memoryFee
IF I == SLOAD
F = S + dataFee
IF I == EXTRO OR I == BALANCE
F = S + extroFee
IF I == MKTX
F = S + txFee
IF I == SHA256 OR I == SHA3 OR I == RIPEMD160 OR I == ECMUL OR I == ECADD
OR I == ECSIGN OR I == ECRECOVER OR I == ECVALID THEN
F = S + cryptoFee
Where:
B[ A ] is the balance of the address given by A, with A interpreted as an address.
ADDRESS is the address of the present contract.
10. b. Initial Operation
STEPSDONE := 0
PC' := 0
FOR ALL i: M' is initialised to the empty byte sequence.
T' is initialised such that its cardinality is zero (i.e. the stack starts empty).
S' must be equal to the value of P when the previous execution halted.
10. c. General Operation
The given steps are looped:
1. Execution halts if B[ ADDRESS ] < F( C[PC'], S', P', STEPSDONE )
2. B[ ADDRESS ] := B[ ADDRESS ] - F( P'[PC'], S', P', STEPSDONE )
3. The operation given by P'[PC'] determines PC, P, T, S.
4. PC' := PC; P' := P; T' := T; S' := S; STEPSDONE := STEPSDONE + 1
10. d. VM Operations
Summary line:
<Op-code>: <Mnemonic> -<R> +<A>
If PC is not defined explicitly, then it must be assumed PC := PC' + 1. Exceptions are PUSH, JUMP and JUMPI.
The cardinality of T (i.e. size of the stack) is altered by A - R between T \& T', by adding or removing items as necessary from the front.
Where:
R: The minimal cardinality of the stack for this instruction to proceed. If this is not achieved then the machine halts with a stack underflow exception. (Note: In some cases of some implementations, this is also the number of values "popped" from the implementation's stack during the course of instruction execution.)
(A - R): The net change in cardinality of the stack over the course of instruction execution.
$\forall$ i: if T[i] is not defined explicitly, then it must be assumed T[i] := T'[i + R - A] where i + R >= A.
$\forall$ i: if M[i] is not defined explicitly, then it must be assumed M[i] := M'[i].
$\forall$ i: if S[i] is not defined explicitly, then it must be assumed S[i] := S'[i].
The expression (COND ? ONE : ZERO), where COND is an expression and ONE and ZERO are both value placeholders, evaluates to ONE if COND is true, and ZERO if COND is false. This is similar to the C-style ternary operator.
When a 32-byte machine datum is interpreted as a 160-bit address or hash, the rightwards 20 bytes are taken (i.e. the low-order bytes when interpreting the data as Big-Endian).
++ is the concatenation operator; all operands are byte arrays (mostly 32-byte arrays here, since that's the size of the VM's data \& address types).
LEFT\_BYTES(A, n) returns the array bytes comprising the first (leftmost) n bytes of the 32 byte array A, which can be considered equivalent to a single value in the VM.
\subsection{Block Finalisation} \label{ch:finalisation}
\subsection{Mining Proof-of-Work} \label{ch:pow}
\subsection{Block-tree to Block-chain} \label{ch:ghost}
\section{User Interface}
TODO
\section{Discussion} \label{ch:discussion}
\subsection{Examples} \label{ch:examples}
\section{Conclusion} \label{ch:conclusion}
\subsection{Further Work} \label{ch:further}
\bibliography{Biblio}
\bibliographystyle{plainnat}
\appendix
\section{Terminology}
\begin{description}
\item[External Actor] A person or other entity able to interface to an Ethereum node, but external to the world of Ethereum. It can interact with Ethereum through depositing signed Transactions and inspecting the block-chain and associated state. Has one (or more) intrinsic Accounts.
\item[Address] A 160-bit code used for identifying Accounts.
\item[Account] Accounts have an intrinsic balance and transaction count maintained as part of the Ethereum state. They are owned either by External Actors or intrinsically (as an identity) an Autonomous Object within Ethereum. If an Account identifies an Autonomous Object, then Ethereum will also maintain a Storage State particular to that Account. Each Account has a single Address that identifies it.
\item[Transaction] A piece of data, signed by an External Actor. It represents either a Message or a new Autonomous Object. Transactions are recorded into each block of the block-chain.
\item[Autonomous Object] A virtual object existant only within the hypothetical state of Ethereum. Has an intrinsic address. Incorporated only as the state of the storage component of the VM.
\item[Storage State] The information particular to a given Autonomous Object that is maintained between the times that it runs.
\item[Message] Data (as a set of bytes) and Value (specified as Ether) that is passed between two Accounts in a perfectly trusted way, either through the deterministic operation of an Autonomous Object or the cryptographically secure signature of the Transaction.
\item[Message Call] The act of passing a message from one Account to another. If the destination account is an Autonomous Object, then the VM will be started with the state of said Object and the Message acted upon. If the message sender is an Autonomous Object, then the Call passes any data returned from the VM operation.
\item[Gas] The fundamental network cost unit. Paid for exclusively by Ether (as of PoC-4), which is converted freely to and from Gas as required. Gas does not exist outside of the internal Ethereum computation engine; its price is set by the Transaction and miners are free to ignore Transactions whose Gas price is too low.
\item[Contract] Synonym for Autonomous Object used for non-technical audiences.
\item[Object] Synonym for Autonomous Object.
\item[App] An end-user-visible application hosted in the Ethereum Browser.
\item[Ethereum Browser] (aka Ethereum Reference Client) A cross-platform GUI of an interface similar to a simplified browser (ala Chrome) that is able to host sandboxed applications whose back-end is purely on the Ethereum protocol.
\item[Ethereum Virtual Machine] (aka EVM) The virtual machine that forms the key part of the execution model for a contract's program code.
\item[EVM Code] The bytecode that the EVM can natively execute.
\item[EVM Assembly] The human-readable form of EVM-code.
\item[LLL] The Lisp-like Low-level Language, a human-writable language used for authoring simple contracts.
\end{description}
\section{Recursive Length Prefix}\label{app:rlp}
\section{Modified Merkle Patricia Tree}\label{app:trie}
\section{Hex-Prefix Encoding}\label{app:hexprefix}
\section{Formal Specification of Structures}
Though RLP is data-agnostic, it does specify a canonical representation for integer quantities. It is big-endian with no leading zero-bytes. Thus for elements than feasibly can be stored as integers, it becomes important to specify whether they should adhere to this canonical representation or be left in some other (perhaps more 'native') format.
In the case of counts, balances, fees and amounts of wei, the canon-integer form must be used when storing in RLP. We call these INTs.
In the case of hashes (256-bit or 160-bit), user-visible strings and specialised byte-arrays (e.g. hex-prefix notation from the trie), they should be stored as unformatted byte-array data and not altered into some other form. We call these BINs.
When interpreting RLP data, clients are required to consider non-canonical INT fields in the same way as otherwise invalid RLP data and dismiss it completely.
Specifically:
for the Block header:
\begin{verbatim}
[
parentHash: BIN (256-bit),
unclesHash: BIN (256-bit),
coinbase: BIN (160-bit),
stateRoot: BIN (256-bit),
transactionsHash: BIN (256-bit),
difficulty: INT,
timestamp: INT,
extraData: BIN (typically 256-bit),
nonce: BIN (256-bit)
]
\end{verbatim}
(note: 'nonce', the last element, refers to a hash here and so is binary)
for entries in the State trie for normal addresses:
\begin{verbatim}
[
balance: INT,
nonce: INT
]
\end{verbatim}
and for contract addresses:
\begin{verbatim}
[
balance: INT,
nonce: INT,
storageRoot: BIN (256-bit),
codeHash: BIN (256-bit)
]
\end{verbatim}
(note: 'nonce', the second element, refers to a tx-count here and so is integer)
for message call transactions:
\begin{verbatim}
[
nonce: INT,
value: INT,
gasPrice: INT,
gas: INT,
recvAddr: BIN (160-bit),
data: BIN (arbitrary length),
v: INT,
r: INT,
s: INT
]
\end{verbatim}
or, for contract creation transactions,
\begin{verbatim}
[
nonce: INT,
value: INT,
gasPrice: INT,
gas: INT,
0: BIN (160-bit),
code: BIN (arbitrary length),
init: BIN (arbitrary length),
v: INT,
r: INT,
s: INT
]
\end{verbatim}
(note: 'nonce', the first element, refers to a tx-count here and so is integer)
The nonce in the transaction refers to the total amount of transactions sent from the address up until that moment in time.
for blocks, there are no immediate data field, but lists:
\begin{verbatim}
[
blockHeader: [...]
uncleList: [ [...], [...], ... ]
txList: [ [...], [...], ... ]
]
\end{verbatim}
Uncle-blocks contain only the uncle's header.
\section{The State DB}\label{app:state}
\section{Signing Transactions}\label{app:signing}
\section{Virtual Machine Specification}\label{app:vm}
0s: arithmetic operations
0x00: STOP -0 +0
Halts execution.
Any gas left over gets returned to caller (or in the case of the top-level call, the sender converted back to ETH).
0x01: ADD -2 +1
S[0] := S'[0] + S'[1]
0x02: MUL -2 +1
S[0] := S'[0] * S'[1]
0x03: SUB -2 +1
S[0] := S'[0] - S'[1]
0x04: DIV -2 +1
S[0] := S'[0] / S'[1]
0x05: SDIV -2 +1
S[0] := S'[0] / S'[1]
S'[0] \& S'[1] are interpreted as signed 256-bit values for the purposes of this operation.
0x06: MOD -2 +1
S[0] := S'[0] % S'[1]
0x07: SMOD -2 +1
S[0] := S'[0] % S'[1]
S'[0] \& S'[1] are interpreted as signed 256-bit values for the purposes of this operation.
0x08: EXP -2 +1
S[0] := S'[0] + S'[1]
0x09: NEG -1 +1
S[0] := -S'[0]
0x0a: LT -2 +1
S[0] := S'[0] < S'[1] ? 1 : 0
0x0b: GT -2 +1
S[0] := S'[0] > S'[1] ? 1 : 0
0x0c: EQ -2 +1
S[0] := S'[0] == S'[1] ? 1 : 0
0x0d: NOT -1 +1
S[0] := S'[0] == 0 ? 1 : 0
10s: bit operations
0x10: AND -2 +1
S[0] := S'[0] AND S'[1]
0x11: OR -2 +1
S[0] := S'[0] OR S'[1]
0x12: XOR -2 +1
S[0] := S'[0] XOR S'[1]
0x13: BYTE -2 +1
S[0] := S'[0]th byte of S'[1]
if S'[0] < 32
S[0] := 0
otherwise
for Xth byte, we count from left - 0th is therefore the leftmost (most significant in BE) byte.
20s: crypto opcodes
0x20: SHA3 -2 +1
S[0] := SHA3( T'[ S'[0] ++ ... ++ (S'[0] + S'[1]) ])
30s: closure state opcodes
0x30: ADDRESS -0 +1
S[0] := ADDRESS
i.e. the address of this closure.
0x31: BALANCE -1 +1
S[0] := B[ S'[0] ]
i.e. the balance of this closure.
0x32: ORIGIN -0 +1
S[0] := A
Where A is the address of the account that made the original transaction leading to the current closure and is paying the fees
0x33: CALLER -0 +1
S[0] := A
Where A is the address of the object that made this call.
0x34: CALLVALUE -0 +1
S[0] := V
Where V is the value attached to this call.
0x35: CALLDATALOAD -1 +0
S[0] := D[ S'[0] ... (S'[0] + 31) ]
Where D is the data attached to this call (as a byte array).
Any bytes that are out of bounds of the data are defined as zero.
0x36: CALLDATASIZE -0 +1
S[0] := DS
Where DS is the number of bytes of data attached to this call.
0x37: GASPRICE -0 +1
S[0] := V
Where V is the current gas price (né fee multiplier).
40s: block operations
0x40: PREVHASH -0 +1
S[0] := H
Where H is the SHA3 hash of the previous block.
0x41: COINBASE -0 +1
S[0] := A
Where A is the coinbase address of the current block.
0x42: TIMESTAMP -0 +1
S[0] := T
Where T is the timestamp of the current block (given as the Unix time\_t when this block began its existence).
0x43: NUMBER -0 +1
S[0] := N
Where N is the block number of the current block (counting upwards from genesis block which has N == 0).
0x44: DIFFICULTY -0 +1
S[0] := D
Where D is the difficulty of the current block.
0x45: GASLIMIT -0 +1
S[0] := L
Where L is the total gas limit of the current block. Always $10^6$.
50s: stack, memory, storage and execution path operations
0x51: POP -1 +0
0x52: DUP -1 +2
S[0] := S'[0]
0x53: SWAP -2 +2
S[0] := S'[1]
S[1] := S'[0]
0x54: MLOAD -1 +1
S[0] := T'[ S'[0] ... S'[0] + 31 ]
0x55: MSTORE -2 +0
T[ S'[0] ... S'[0] + 31 ] := S'[1]
0x56: MSTORE8 -2 +0
T[ S'[0] ... S'[0] + 31 ] := S'[1] \& 0xff
0x57: SLOAD -1 +1
S[0] := P'[ S'[0] ]
0x58: SSTORE -2 +0
P[ S'[0] ] := S'[1]
0x59: JUMP -1 +0
PC := S'[0]
0x5a: JUMPI -2 +0
PC := S'[1] == 0 ? PC' : S'[0]
0x:5b PC -0 +1
S[0] := PC
0x5c: MSIZE -0 +1
S[0] = sizeof(T)
0x5d: GAS
S[0] := G
Where G is the amount of gas remaining after executing the opcode.
60s \& 70s: push
0x60: PUSH1 0 +1
S[0] := C[ PC' + 1 ]
PC := PC' + 2
0x61: PUSH2 0 +1
S[0] := C[ PC' + 1 ] ++ C[ PC' + 2 ]
PC := PC' + 3
...
0x7f: PUSH32 0 +1
S[0] := C[ PC' + 1 ] ++ C[ PC' + 2 ] ++ ... ++ C[ PC' + 32 ]
PC := PC' + 33
f0s: closure-level operations
0xf0: CREATE -5 +1
Immediately creates a contract where:
The endowment is given by S'[0]
The body code of the eventual closure is given by T'[ S'[1] ... ( S'[1] + S'[2] - 1 ) ]
The initialisation code of the eventual closure is given by T'[ S'[3] ... ( S'[3] + S'[4] - 1 ) ]
(Thus the total number of bytes of the transaction data is given by S'[2] + S'[4].)
S[0] = A
where A is the address of the created contract or 0 if the creation failed.
Fees are deducted from the sender balance to pay for enough gas to complete the operation (i.e. contract creation fee + initial storage fee). If there was not enough gas to complete the operation, then all gas will be deducted and the operation fails.
0xf1: CALL -7 +1
Immediately executes a call where:
The recipient is given by S'[0], when interpreted as an address.
The value is given by S'[1]
The gas is given by S'[2]
The input data of the call is given by T'[ S'[3] ... ( S'[3] + S'[4] - 1 ) ]
(Thus the number of bytes of the transaction is given by S'[4].)
The output data of the call is given by T[ S'[5] ... ( S'[5] + MIN( S'[6], S[0] ) - 1 ) ]
If 0 gas is specified, transfer all gas from the caller to callee gas balance, otherwise transfer only the amount given. If there isn't enough gas in the caller balance, operation fails.
If the value is less than the amount in the caller's balance then nothing is executed and the S'[2] gas gets refunded to the caller.
See Appendix A for execution.
Add any remaining callee gas to the caller's gas.
S[0] = R
where R = 1 when the instrinsic return code of the call is true, R = 0 otherwise.
NOT YET: POST -5 +1
Registers for delayed execution a call where:
The recipient is given by S'[0], when interpreted as an address.
The value is given by S'[1]
The gas to supply the transaction with is given by S'[2] (paid for from the current gas balance)
The input data of the call is given by T'[ S'[3] ... ( S'[3] + S'[4] - 1 ) ]
(Thus the number of bytes of the transaction is given by S'[4].)
Contract pays for itself to run at a defered time from its own GAS supply. The miner has no choice but to execute.
NOT YET: ALARM -6 +1
Registers for delayed execution a call where:
The recipient is given by S'[0], when interpreted as an address.
The value is given by S'[1]
The gas (to convert from ETH at the later time) is given by S'[2]
The number of blocks to wait before executing is S'[3]
The input data of the call is given by T'[ S'[4] ... ( S'[4] + S'[5] - 1 ) ]
(Thus the number of bytes of the transaction is given by S'[5].)
Total gas used now is S'[3] * S'[5] * deferFee.
Contract pays for itself to run at the defered time converting given amount of gas from its ETH balance; if it cannot pay it terminates as a bad transaction. TODO: include baseFee and allow miner freedom to determine whether to execute or not. If not, the next miner will have the chance.
0xf2: RETURN -2 +0
Halts execution.
R := T[ S'[0] ... ( S'[0] + S'[1] - 1 ) ]
Where the output data of the call is specified as R.
Any gas left over gets returned to caller (or in the case of the top-level call, the sender converted back to ETH).
0xff: SUICIDE -1 +0
Halts execution.
FOR ALL i: IF P[i] NOT EQUAL TO 0 THEN B[ S'[0] ] := B[ S'[0] ] + memoryFee
B[ S'[0] ] := B[ S'[0] ] + B[ ADDRESS ]
Removes all contract-related information from the Ethereum system.
\section{Low-level Lisp-like Language}\label{app:lll}
\section{Wire Protocol}\label{app:wire}
\section{Peer Strategy}\label{app:peers}
\section{Genesis Block}\label{app:genesis}
The header of the genesis block is 9 items, and is specified thus:
$[z_{256}, \mathcal{S}(\mathcal{R}([])), z_{160}, stateRoot, \mathcal{S}(\mathcal{R}([])), 2^{22}, 0, \text{`'}, 42]$
Where:
$z_{256}$ refers to the parent hash, a 256-bit hash which is all zeroes.
$z_{160}$ refers to the coinbase address, a 160-bit hash which is all zeroes.
$2^{22}$ refers to the difficulty.
0 refers to the timestamp (the Unix epoch).
`' refers to the extradata, an empty byte array.
$\mathcal{S}(\mathcal{R}([]))$ values refer to the hashes of the transaction and uncle lists in RLP, both empty.
$stateRoot$ refers to the state root.
\section{Fee Schedule}\label{app:fees}
Constants:
* STEPGAS = 1
* SHA3GAS = 20
* SLOADGAS = 20
* SSTOREGAS = 100
* BALANCEGAS = 20
* CREATEGAS = 100
* CALLGAS = 20
* MEMORYGAS = 1
* TXDATAGAS = 5 [not used in the VM]
\end{multicols}
\end{document}