This article discusses Duktape specific performance characteristics and provides some hints to avoid Duktape specific performance pitfalls.
As general guidelines:
-
Gcc/clang
-O2
is a good default for performance optimization. -
When performance matters, profile guided optimization (PGO) is strongly recommended.
The difference between plain -O2
and -O2
with PGO can be very large,
often around 20%! For example:
# -O2 without PGO.
$ time ./duk.O2 tests/perf/test-mandel.js
[...]
real 0m3.061s
user 0m3.059s
sys 0m0.000s
# -O2 with PGO.
$ time ./duk-pgo.O2 tests/perf/test-mandel.js
[...]
real 0m2.488s
user 0m2.487s
sys 0m0.000s
In this particular case the performance improvement is almost 20%.
See:
Strings are interned: only a single copy of a certain string exists at any point in time. Interning a string involves hashing the string and looking up a global string table to see whether the string is already present. If so, a pointer to the existing string is returned; if not, the string is inserted into the string table, potentially involving a string table resize. While a string remains reachable, it has a unique and a stable pointer which allows byte-by-byte string comparisons to be converted to simple pointer comparisons. Also, string hashes are computed during interning which makes the use of string keys in internal hash tables efficient.
There are many downsides also. Strings cannot be modified in-place but
a copy needs to be made for every modification. For instance, repeated
string concatenation creates a temporary value for each intermediate string
which is especially bad if a result string is built one character at a time.
Duktape internal primitives, such as string case conversion and array
join()
, try to avoid these downsides by minimizing the number of temporary
strings created.
The internal memory representation for strings is extended UTF-8, which represents each ASCII character with a single byte but uses two or more bytes to represent non-ASCII characters. This reduces memory footprint for most strings and makes strings easy to interact with in C code. However, it makes random access expensive for non-ASCII strings because characters are represented by a variable number of bytes. Random access is needed for operations such as extracting a substring or looking up a character at a certain character index.
Duktape automatically detects pure ASCII strings (based on the fact that their character and byte length are identical) and provides efficient random access to such strings.
However, when a string contains non-ASCII characters a string cache
is used to resolve a character index to an internal byte index. Duktape
maintains a few string cache entries (internal define DUK_HEAP_STRCACHE_SIZE
,
currently 4) which remember the last byte offset and character offset for
recently accessed strings. Character index lookups near a cached
character/byte offset can be efficiently handled by scanning backwards or
forwards from the cached location. When a string access cannot be resolved
using the cache, the string is scanned either from the beginning or the end,
which is obviously very expensive for large strings. The cache is maintained
with a very simple LRU mechanism and is transparent to both ECMAScript and C
code.
The string cache makes simple loops like the following efficient:
var i;
var n = inp.length;
for (i = 0; i < n; i++) {
print(inp.charCodeAt(i));
}
When random accesses are made from here and there to multiple strings, the strings may very easily fall out of the cache and become expensive at least for longer strings.
Note that the cache never maintains more than one entry for each string, so the following would be very inefficient:
var i;
var n = inp.length;
for (i = 0; i < n; i++) {
// Accessing the string alternatively from beginning and end will
// have a major performance impact for large strings.
print(inp.charCodeAt(i));
print(inp.charCodeAt(n - 1 - i));
}
As mentioned above, these performance issues are avoided entirely for ASCII strings which behave as one would expect. More generally, Duktape provides fast paths for ASCII characters and pure ASCII strings in internal algorithms whenever applicable. This applies to algorithms such as case conversion, regexp matching, etc.
There is a fast path for reading and writing numeric indices of (dense)
arrays and plain buffer values, e.g. x = buf[123]
or buf[123] = x
.
The fast path avoids coercing the index to a string (here "123"
) before
attempting a lookup.
There's a similar fast path for buffer objects (Duktape.Buffer, Node.js Buffer, ArrayBuffer, and TypedArray views), but it is about 20% slower.
Object properties are stored in a linear key/value list which provides
stable ordering (insertion order). When an object has enough properties
(internal define DUK_HOBJECT_E_USE_HASH_LIMIT
, currently 32), a hash
lookup table is also allocated to speed up property lookups. Even in
this case the key ordering is retained which is a practical requirement
for an ECMAScript implementation. The hash part is avoided for most objects
because it increases memory footprint and doesn't significantly speed up
property lookups for very small objects.
For most objects property lookup thus involves a linear comparison against the object's property table. Because properties are kept in the property table in their insertion order, properties added earlier are slightly faster to access than those added later. When the object grows large enough to gain a hash table this effect disappears.
Array elements are stored in a special "array part" to reduce memory
footprint and to speed up access. Accessing an array with a numeric index
officially first coerces the number to a string (e.g. x[123]
to
x["123"]
) and then does a string key lookup; when an object has an array
part no temporary string is actually created in most read/write situations.
The array part can be "sparse", i.e. contain unmapped entries. Duktape
occasionally rechecks the density of the array part, and if it becomes too
sparse the array part is abandoned (current limit is roughly: if fewer than
25% of array part elements are mapped, the array part is abandoned). The
array entries are then converted to ordinary object properties, with every
mapped array index converted to an explicit string key (such as "123"
),
which is relatively expensive. If an array part has once been abandoned,
it is never recreated even if the object would be dense enough to warrant
an array part.
Elements in the array part are required to be plain properties (not accessors) and have default property attributes (writable, enumerable, and configurable). If any element deviates from this, the array part is again abandoned and array elements converted to ordinary properties.
Duktape has two modes for storing and accessing identifiers (function arguments, local variables, function declarations): a fast path and a slow path. The fast path is used when an identifier can be bound to a virtual machine register, i.e., a fixed index in a virtual stack frame allocated for a function. Identifier access is then simply an array lookup. The slow path is used when the fast path cannot be safely used; identifier accesses are then converted to explicit property lookups on either external or internal objects, which is more than an order of magnitude slower.
To keep identifier accesses in the fast path:
-
Execute (almost all) code inside ECMAScript functions, not in the top-level global or eval code: global/eval code never uses fast path identifier accesses (however, function code inside global/eval does).
-
Store frequently accessed values in local variables instead of looking them up from the global object or other objects.
When an object is enumerated, with either the for-in
statement or
Object.keys()
, Duktape first traverses the target object and its
prototype chain and forms an internal enumeration object, which contains
all the enumeration keys as strings. In particular, all array indices
(or character indices in case of strings) are converted and interned into
string values before enumeration and they remain interned until the
enumeration completes. This can be memory intensive especially if large
arrays or strings are enumerated.
Note, however, that iterating a string or an array with for-in
and
expecting the array elements or string indices to be enumerated in an
ascending order is non-portable. Such behavior, while guaranteed by many
implementations including Duktape, is not guaranteed by the ECMAScript
E5.1 standard.
ECMAScript has several features which make function entry and execution quite expensive. The general goal of the Duktape ECMAScript compiler is to avoid the troublesome features for most functions while providing full compatibility for the rest.
An ideal compiled function has all its variables and functions bound to
virtual machine registers to allow fast path identifier access, avoids
creation of the arguments
object on entry, avoids creation of explicit
lexical environment records upon entry and during execution, and avoids
storing any lexical environment related control information such as
internal identifier-to-register binding tables.
The following features have a significant impact on execution performance:
-
Access to the
arguments
object: requires creation of an expensive object upon function entry in case it is accessed. -
A direct call to
eval()
: requires initialization of thearguments
and full identifier binding information needs to be retained in case evaluated code needs it. -
Global and eval code in general: identifiers are never bound to virtual machine registers but use explicit property lookups instead. Statements in eval code also have implicit values which require bytecode instructions to evalute.
The following features have a more moderate impact:
-
try-catch-finally
statement: the dynamic binding required by the catch variable is relatively expensive at present. -
with
statement: the object binding required is relatively expensive at present. -
Use of bound functions, i.e. functions created with
Function.prototype.bind()
: an additional function object is created, and function invocation is slowed down by handling of bound function objects and argument shuffling. -
More than about 250 formal arguments, literals, and active temporaries: causes bytecode to use register shuffling which increases bytecode size and slows down execution.
To avoid these, isolate performance critical parts into separate minimal functions which avoid using the features mentioned above.
All temporary strings are interned. It is particularly bad to accumulate strings in a loop:
var t = '';
for (var i = 0; i < 1024; i++) {
t += 'x';
}
This will intern 1025 strings. Execution time is O(n^2)
where n
is the
loop limit. It is better to use a temporary array instead:
var t = [];
for (var i = 0; i < 1024; i++) {
t[i] = 'x';
}
t = t.join('');
Here, x
will be interned once into a function constant, and each array
entry simply refers to the same string, typically costing only 8 bytes
per array entry. The final Array.prototype.join()
avoids unnecessary
interning and creates the final string in one go.
Avoid operations which require access to a random character offset inside a large string containing one or more non-ASCII characters. Such accesses require use of the internal "string cache" and may, in the worst case, require a brute force scanning of the string to find the correct byte offset corresponding to the character offset.
Case conversion and other Unicode related operations have fast paths for ASCII codepoints but fall back to a slow path for non-ASCII codepoints. The slow path is size optimized, not speed optimized, and often involve linear range matching.
Both buffer objects and plain buffer values have a fast path when buffer contents are accessed with numeric indices. The fast path for plain buffer values is a bit faster, so your code will run faster if you get the plain buffer before iteration:
var b, i, n;
// Buffer object, typeof is 'object'
var bufferValue = new Duktape.Buffer('foo');
print(typeof bufferValue); // 'object'
// Get plain buffer, if already plain, no harm
b = bufferValue.valueOf();
print(typeof b); // always 'buffer'
n = b.length;
for (i = 0; i < n; i++) {
print(i, b[i]); // fast path for plain buffer access
}
When creating buffers, note that new Duktape.Buffer(x)
always creates a
Buffer object, while Duktape.Buffer(x)
returns a plain buffer value. This
mimics how ECMAScript new String()
and String()
work. Plain buffers
should be preferred whenever possible.
If an array becomes too sparse at any point, Duktape will abandon the array part permanently and convert array properties to explicit string keyed properties. This may happen for instance if an array is initialized with a descending index:
var arr = [];
for (var i = 1000; i >= 0; i--) {
// bad: first write will abandon array part permanently
arr[i] = i * i;
}
Right after the first array write the array part would contain 1001 entries with only one mapped array element. The density of the array would thus be less than 0.1%. This is way below the density limit for abandoning the array part, so the array part is abandoned immediately. At the end the array part would be 100% dense but will never be restored. Using an ascending index fixes the issue:
var arr = [];
for (var i = 0; i < 1000; i++) {
arr[i] = i * i;
}
Setting the length
property of an array manually does not, by itself, cause
an array part to be abandoned. To simplify a bit, the array density check
compares the number of mapped elements relative to the highest used element
(actually allocated size). The length
property does not affect the check.
Although setting an array length beforehand may effectively pre-allocate an
array in some implementations, it has no such effect in Duktape, at least at
the moment. For example:
var arr = [];
arr.length = 1001; // array part not abandoned, but no speedup in Duktape
for (var i = 0; i < 1000; i++) {
arr[i] = i * i;
}
Because the internal enumeration object contains all (used) array
indices converted to string values, avoid for-in
enumeration
of at least large arrays. As a concrete example, consider:
var a = [];
for (var i = 0; i < 1000000; i++) {
a[i] = i;
}
for (var i in a) {
// Before this loop is first entered, a million strings ("0", "1",
// ..., "999999") will be interned.
print(i, a[i]);
}
// The million strings become garbage collectable only here.
The internal enumeration object created in this example would contain a million interned string keys for "0", "1", ..., "999999". All of these keys would remain reachable for the entire duration of the enumeration. The following code would perform much better (and would be more portable, as it makes no assumptions on enumeration order):
var a = [];
for (var i = 0; i < 1000000; i++) {
a[i] = i;
}
var n = a.length;
for (var i = 0; i < n; i++) {
print(i, a[i]);
}
Identifier accesses in global and eval code always use slow path instructions to ensure correctness. This is at least a few orders of magnitude slower than the fast path where identifiers are mapped to registers of a function activation.
So, this is slow:
for (var i = 0; i < 100; i++) {
print(i);
}
Each read and write of i
will be an explicit environment record lookup,
essentially a property lookup from an internal environment record object,
with the string key i
.
Optimize by putting most code into a function:
function main() {
for (var i = 0; i < 100; i++) {
print(i);
}
}
main();
Here, i
will be mapped to a function register, and each access will be a
simple register reference (basically a pointer to a tagged value), which is
much faster than the slow path.
If you don't want to name an explicit function, use:
(function() {
var i;
for (i = 0; i < 100; i++) {
print(i);
}
})();
Eval code provides an implicit return value which also has a performance impact. Consider, for instance, the following:
var res = eval("if (4 >= 3) { 'foo'; } else { 'bar'; }");
print(res); // prints 'foo'
To support such code the compiler emits bytecode to store a statement's implicit return value to a temporary register in case it is needed. These instructions slow down execution and increase bytecode size unnecessarily.
When variables are bound to virtual machine registers, identifier lookups are much faster than using explicit property lookups on the global object or on other objects.
When an external value or function is required multiple times, copy it to a local variable instead:
function slow(x) {
var i;
// 'x.length' is an explicit property lookup and happens on every loop
for (i = 0; i < x.length; i++) {
// 'print' causes a property lookup to the global object
print(x[i]);
}
}
function fast(x) {
var i;
var n = x.length;
var p = print;
// every access in the loop now happens through register-bound identifiers
for (i = 0; i < n; i++) {
p(x[i]);
}
}
Use such optimizations only where it matters, because they often reduce code readability.
The undefined
value is interesting in ECMAScript because it's not actually
a literal but a global variable. As such, if you refer to it just as
undefined
, Duktape will currently read it using a slow path variable read.
For example:
print("undefined is: " + undefined);
produces bytecode like:
; constant 19: "undefined"
...
GETVAR r123, c19 ; load undefined
...
This is slower and uses larger bytecode than necessary. Using e.g. void null
(or equivalently void 0
, etc) produces better bytecode. For example:
print("undefined is: " + void null);
produces bytecode like:
...
LDUNDEF r123
...
This is faster because there's no slow path access. The code is also better
because it is no longer dependent on the global undefined
binding being
intact.
There's a fast path for JSON.stringify()
serialization which is used when
there is no "replacer" argument. Indent argument and JX/JC support was added
to the fast path in Duktape 1.4.0.
The fast path assumes that it can serialize the argument value without risking any side effects which might mutate the value being serialized, and will fall back to the slower default algorithm if necessary.
This happens at least when:
-
any object has a
toJSON()
property -
any object property is a getter
-
any object is a Proxy
See test-bi-json-enc-fastpath.js for detailed notes on current limitations; the fast path preconditions and limitations are very likely to change between Duktape releases.
The fast path is currently not enabled by default. To enable, ensure
DUK_USE_JSON_STRINGIFY_FASTPATH
is enabled in your duk_config.h
.