Skip to content

Latest commit

 

History

History
1306 lines (946 loc) · 64.5 KB

buffers.rst

File metadata and controls

1306 lines (946 loc) · 64.5 KB

Buffers

Overview

ECMAScript did not originally have a binary array (or binary string) data type so various approaches are used:

This document describes how various buffer types have been implemented in Duktape. The goal is to minimize footprint, so the internal buffer type implementation shares a lot of code even though multiple APIs are provided.

Duktape buffer support

Overview

Duktape currently supports the following buffer and buffer-related values:

  • Plain Duktape buffer
  • Node.js Buffer object
  • ArrayBuffer, DataView, and TypedArray (Uint8Array etc) objects

The plain buffer value forms the basic underlying type for all the other buffer types; the relationship is similar to a plain string and a String object. Plain buffers can be fixed, dynamic, or external:

  • Fixed buffers cannot be resized but have a stable data pointer.
  • Dynamic buffers can be resized at the cost of an unstable data pointer. You can also "steal" the current buffer allocation through the duk_steal_buffer() API call.
  • External buffers point to user-allocated external data area whose pointer and length can be changed but Duktape won't resize or automatically free the buffer.

Plain buffers have Uint8Array-like virtual properties for buffer byte indices, .length, .byteOffset, .byteLength, and .BYTES_PER_ELEMENT. Assignment has the same semantics as Uint8Array: bytes are written with modulo 256 semantics, bytes read back as unsigned 8-bit values. The plain buffer type is designed to be as friendly as possible for low level embedded programming, and has a minimal footprint because there's no ECMAScript object associated with it. It is mostly intended to be accessed from C code. Duktape also uses buffer values internally.

The various buffer and view objects ultimately point to an underlying buffer and provide access to the full buffer or a slice/view of the buffer using either accessor methods (like getUint32()) or virtual index properties.

All buffer objects are internally implemented using the duk_hbufobj type which makes it easy to mix values between different APIs. As a result Duktape e.g. accepts a Node.js Buffer as an input for a Khronos DataView.

duk_hbufobj

The internal duk_hbufobj type is a heap-allocated structure extended from duk_hobject. In addition having the usual object property tables and such, it has C struct fields for quick access:

  • A reference to an underlying plain buffer value (duk_hbuffer), which may be of any type: fixed, dynamic, or external.
  • A byte offset/length pair providing a window into the underlying buffer. These values directly map to the virtual byteOffset and byteLength properties.
  • An element type and a shift count. These provide enough information to support Khronos TypedArray views so that index values can be mapped to byte offsets and encoded/decoded appropriately. The virtual length property indicates the number of elements (not bytes) available, and is provided by dividing the byte length field with the element size (rounding downwards). The virtual BYTES_PER_ELEMENT is provided based on the element shift count (as "1 << shift").

The following figure illustrates these for a fictional Int16-view:

:  0  1: 2  3  4  5  6  7  8  9 10 11:12 13 14 15 :
+------+-----------------------------+------------+
| xx xx:xx xx xx xx xx xx xx xx xx xx:xx xx xx xx |   underlying buffer
+------+-----------------------------+------------+   (16 bytes)
       :     :     :     :     :     :
       :     :     :     :     :     :    shift is 1, element size is
       :     :     :     :     :     :    (1 << 1) => 2 bytes
       |-----|-----|-----|-----|-----|    (= .BYTES_PER_ELEMENT)
       : [0] : [1] : [2] : [3] : [4] :
       :     :                            elem. type is Int16 (signed)
       :     :
       :<--->:  (2-byte elements)         byte offset: 2 (= .byteOffset)
                                          byte length: 10 (= .byteLength)
                                          => view maps byte range [2,12[

                                          length in elements: 5 (= .length)
                                          virtual indices: 0, 1, 2, 3, 4

Each duk_hbufobj has virtual index behavior with indices mapping logically to elements in the range [0,length[. Elements may be signed or unsigned integers of multiple sizes, IEEE floats, or IEEE doubles. All accesses to the underlying buffer are byte-based, and no alignment is required by Duktape; however, Khronos TypedArray specification restricts creation of non-element-aligned views. All multi-byte elements are accessed in the host endianness (this is required by the ES2015 TypedArray specification).

A duk_hbufobj acts as a both a buffer representation (providing Node.js Buffer and ArrayBuffer) and a view representation (prodiving e.g. DataView, Uint8Array, and other TypedArray views). It supports both a direct 1:1 mapping to an underlying buffer and a slice/view mapping to a subset of the buffer.

The byteLength/byteOffset pair provides a logical window for the buffer object. The underlying buffer may be smaller, e.g. as a result of a dynamic buffer being resized after a duk_hbufobj was created. For example:

+------+---------------------+
| xx xx:xx xx xx xx xx xx xx | / / / /    underlying buffer resized to 9 bytes
+------+---------------------+
       :     :     :     :     :     :
       :     :     :     :  ?  :  ?  :    index 3 is only partially mapped
       :     :     :     :     :     :    inde4 5 is not mapped
       |-----|-----|-----|-----|-----:
       : [0] : [1] : [2] : [3] : [4] :

This is not intended to be a normal usage scenario, so the main goal for Duktape is only to provide memory safe behavior:

  • The virtual properties (byteLength, byteOffset, length) are unchanged.
  • Attempt to read outside the view (fully or partially) returns zero values.
  • Attempt to write outside the view (fully or partially) is silently ignored.
  • Other operations requiring access to the underlying buffer vary in behavior, some operations are silently skipped, some cause a TypeError, etc.

Beyond memory safety, any specific behavior is not part of versioning guarantees and may change even between minor versions.

Summary of buffer-related values

Type Specification .length .byteLength .byteOffset .BYTES_PER_ELEMENT .buffer [index] Element type Read coercion Write coercion Endianness Accessor methods Notes
plain buffer Duktape yes (bytes) yes yes 1 no yes uint8 uint8 ToUint32() & 0xff n/a no Mimic Uint8Array, inherit from Uint8Array.prototype.
Buffer Node.js yes (bytes) yes yes 1 no yes uint8 uint8 ToUint32() & 0xff n/a yes Based on Node.js v0.12.1.
ArrayBuffer TypedArray no yes no no no no n/a n/a n/a n/a no  
DataView TypedArray yes (bytes) yes yes 1 yes yes uint8 uint8 ToUint32() & 0xff n/a yes  
Int8Array TypedArray yes (bytes) yes yes 1 yes yes int8 int8 ToUint32() & 0xff n/a no  
Uint8Array TypedArray yes (bytes) yes yes 1 yes yes uint8 uint8 ToUint32() & 0xff n/a no  
Uint8ClampedArray TypedArray yes (bytes) yes yes 1 yes yes uint8 uint8 special n/a no Write: special clamp/round.
Int16Array TypedArray yes (elements) yes yes 2 yes yes int16 int16 ToUint32() & 0xffff host no  
Uint16Array TypedArray yes (elements) yes yes 2 yes yes uint16 uint16 ToUint32() & 0xffff host no  
Int32Array TypedArray yes (elements) yes yes 4 yes yes int32 int32 ToUint32() host no  
Uint32Array TypedArray yes (elements) yes yes 4 yes yes uint32 uint32 ToUint32() host no  
Float32Array TypedArray yes (elements) yes yes 4 yes yes float float cast to float host no  
Float64Array TypedArray yes (elements) yes yes 8 yes yes double double cast to double host no  

Notes:

  • A plain buffer mimics an Uint8Array wherever possible, and inherits methods and other properties through Uint8Array.prototype.

  • DataView and Node.js Buffer inherit a set of accessor methods from their prototype. These accessors allow fields of different width and type to be manipulated directly. Endianness can be specified, but is limited to little/big (there's no support for ARM mixed endian IEEE doubles).

  • TypedArray views are host endian. Their byte offset relative to the ArrayBuffer they are used on must also be a multiple of the element size (i.e. views must be naturally aligned). These requirements are not very useful from Duktape point of view but they are required by the ES2015 specification.

    (It would be trivial to use a specific endianness or allow unaligned views because Duktape works with the values byte-by-byte anyway.)

  • Uint8ClampedArray has a very specific clamping and rounding behavior which differs from all other view types.

  • An unsigned ToUint32() coercion is used in writing signed values too. For the bytes written to memory the signedness of this coercion doesn't really matter.

Built-in objects related to buffers

Duktape plain buffer value:

  • None

Node.js Buffer:

  • Buffer
  • Buffer.prototype
  • SlowBuffer, only available if one does: require("buffer") and omitted from Duktape implementation

TypedArray:

  • ArrayBuffer
  • ArrayBuffer.prototype
  • DataView
  • DataView.prototype
  • Int8Array
  • Int8Array.prototype
  • Uint8Array
  • Uint8Array.prototype
  • Uint8ClampedArray
  • Uint8ClampedArray.prototype
  • Int16Array
  • Int16Array.prototype
  • Uint16Array
  • Uint16Array.prototype
  • Int32Array
  • Int32Array.prototype
  • Uint32Array
  • Uint32Array.prototype
  • Float32Array
  • Float32Array.prototype
  • Float64Array
  • Float64Array.prototype

None of the prototype objects are mandated by the Khronos specification but are present in ES2015.

Conversions between buffer values

Because Duktape supports three Buffer object APIs, it's important that buffer values can be comfortably exchanged between the APIs (none of the API specifications require such behavior, of course).

As a general rule:

  • Any Buffer object/view (implemented internally as a duk_hbufobj) is accepted by any API expecting a specific object/view. For example, Khronos DataView() constructor accepts a Node.js Buffer, and Node.js Buffer() accepts a Uint8Array as an input.
  • A plain Duktape buffer is accepted as if it was coerced to an Uint8Array. To simplify implementation many internals actually do an explicit Uint8Array coercion when given plain buffers.

This general rules is complicated by a few practical issues:

  • Some APIs create slices/views that share an underlying buffer value, while others create copies. Both behaviors are necessary in some situations.
  • A slice/view which doesn't map 1:1 to an underlying buffer cannot be coerced to a plain buffer value without copying, as the extra offset and length information is not supported for plain buffer values.

The current mixing behavior is described in Duktape Wiki:

Buffer values in the Duktape C API

The C API for plain buffer and buffer object handling is described in Duktape Wiki:

Node.js Buffer notes

The Node.js Buffer type is widely used in server-side programming but is not standardized as such.

Specification notes

Specification notes:

  • A Buffer may point to a slice of an underlying buffer.
  • String-to-buffer coercion has a set of encoding values (other than UTF-8).
  • Buffer prototype's slice() does not copy contents of the slice, but creates a new Buffer which points to the same underlying buffer. This is similar to the TypedArray subarray() operation, but different from the ArrayBuffer slice() operation which creates a new buffer for the slice. With typed arrays a non-copying slice would just be a new view on top of a previous one instead of a new ArrayBuffer.
  • The slice() operation provides offsetted access to the underlying buffer (same as with e.g. Uint8Array). However, a slice is a fully fledged buffer and can be used to create another slice() etc.
  • Buffers have virtual index properties and a virtual 'length' property.
  • Reads and writes have an optional offset and value range check which causes an error for out-of-bounds indices (RangeError) and values (TypeError); the behavior is not always consistent, and chosen Duktape behavior is documented in testcases. When the checks are disabled (noAssert == true), the behavior is memory unsafe and variable; some memory unsafe behavior results. Duktape semantics are always memory safe even at the cost of some performance.
  • Buffer accessor method read and write offsets are byte offsets regardless of data type being accessed. This is similar to Khronos DataView, but different from Khronos TypedArray views whose indices are element-based.
  • There are no alignment requirements for field access. This also matches Khronos DataView behavior, but differs from Khronos TypedArrays which must be aligned.
  • write(U)Int(LE|BE) and read(U)Int(LE|BE) operate on variable-size integers (up to 48-bit) and caller selects number of bytes (and endianness) to read or write.
  • Newly created buffers don't seem to be zeroed automatically. Duktape zeroes buffer data as a side effect of underlying duk_hbuffer values being automatically zeroed. However, if DUK_USE_ZERO_BUFFER_DATA is not set, Node.js Buffers are not zeroed.
  • Buffer inspect() provides a limited hex dump of buffer contents. Duktape doesn't currently provide a similar function by default.
  • SlowBuffer: probably not needed.
  • User code can require('buffer'); this is not supported by Duktape.

Implementation notes

  • Representation must point to a plain buffer and also needs internal slice offset/length properties to implement slice semantics. Slices must be valid inputs for other slices; such slice-of-slice objects can point to the same plain buffer with offset/length pairs resolved at each step.
  • For fast operations, guaranteed property slots could be used. Alternatively a dedicated duk_hobject subtype can be used. (The latter was chosen.)
  • Should be optional and disabled by default because of footprint concerns.
  • Should have a toLogString() which prints inspect() output or some other useful oneliner?

Buffers are not automatically zeroed

> b = new Buffer(16)
<Buffer 00 99 f2 00 00 00 00 00 00 00 00 00 00 00 00 00>
> b.fill(0)
undefined
> b
<Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00>

Range checks and partial writes

By default offset and value ranges are checked:

> b.writeUInt8(0x101, 0)
TypeError: value is out of bounds
    at TypeError (<anonymous>)
    at checkInt (buffer.js:784:11)
    [...]

With an explicit option asserts can be turned off. With assertions disabled invalid offsets are ignored and values are treated with modulo semantics:

> b.writeUInt8(0x101, 0, true)
undefined
> b
<Buffer 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00>

When writing values larger than a byte, partial writes are allowed:

> b.fill(0)
undefined
> b.writeUInt32BE(0xdeadbeef, 13)
RangeError: Trying to write outside buffer length
    at RangeError (<anonymous>)
    at checkInt (buffer.js:788:11)
    [...]
> b.writeUInt32BE(0xdeadbeef, 13, true)
undefined
> b
<Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 de ad be>
> b.fill(0)
undefined
> b.writeUInt32BE(0xdeadbeef, -1, true)
undefined
> b
<Buffer ad be ef 00 00 00 00 00 00 00 00 00 00 00 00 00>

However, such values are not actually "dropped" but can actually be read back with an unchecked out-of-bounds read:

> b = new Buffer(16); b.fill(0); b.writeUInt32BE(0xdeadbeef, -1, true); b
<Buffer ad be ef 00 00 00 00 00 00 00 00 00 00 00 00 00>
> b.readUInt32BE(-1, true).toString(16)
'deadbeef'
> b.fill(1); b
<Buffer 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01>
> b.readUInt32BE(-1, true).toString(16)
'de010101'

This is not just a "safe zone" to avoid implementing partial writes: the out-of-bounds offsets can be large:

> b = new Buffer(16); b.fill(0); b.writeUInt32BE(0xdeadbeef, -10000, true); b
<Buffer 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00>
> b.readUInt32BE(-10003, true).toString(16)
'de'
> b.readUInt32BE(-10000, true).toString(16)
'deadbeef'

Running under valgrind this causes no valgrind gripes, so apparently this is supported behavior. It might be caused by "buffer sharing" where Node.js actually uses a large Buffer to provide multiple smaller Buffers (as slices), and these out-of-bounds accesses hit the shared large Buffer. Sometimes memory unsafe behavior occurs, though.

This behavior is difficult to implement in Duktape, so probably the best approach is to either ignore partial reads/writes, or implement them in an actual "clipping" manner.

Khronos typed array notes

The Khronos typed array specification is related to HTML canvas and WebGL programming. Some of the design choices are affected by this, e.g. the endianness handling and clamped byte write support. The Khronos specification has been refined and merged into ES2015 so this specification has an official status now.

Specification notes

  • ArrayBuffer wraps an underlying buffer object, ArrayBufferView and DataView classes provide "windowed" access to some underlying ArrayBuffer. A buffer object can be "neutered". Apparently neutering happens when "transferring" an ArrayBuffer which is HTML specific. Unsure if neutering needs to be supported.

  • ArrayBuffer does not have virtual indices or 'length' behavior, but TypedArray views do. DataView does not have virtual indices but e.g. V8 provides them in practice.

  • ArrayBuffer has 'byteLength'. Views have a 'byteLength' and a 'length', where 'length' refers to number of elements, not bytes. For example a Uint32Array view with length 4 would have byteLength 16. (For internal reasons, all Duktape ArrayBuffer and view objects provide 'length', 'byteLength', and 'byteOffset'.)

  • ArrayBufferView classes are host endian. DataView is endian independent because caller specifies endianness for each call.

  • TypedArray instances must be created with a byte offset that is a multiple of the element size (i.e. aligned). DataView doesn't have this restriction. (This requirement is unnecessary for Duktape because the implementation never assumes alignment. But, this requirement is implemented for compatibility.)

  • NaN handling is rather fortunate, as it is compatible with packed duk_tval: in other words, NaNs can be substituted with one another. When coerced to integer, NaN is coerced to zero.

  • Modulo semantics for number writes, except Uint8ClampedArray which provides clamped semantics with special rounding when writin values. Both modulo and clamping coerces NaN to zero. With modulo semantics flooring is used (1.999 writes as 1) while clamped semantics uses a specific form of rounding.

  • For the clamping behavior, see:

    Steps for unsigned byte (octet) clamped coercion:

    • Set x to min(max(x, 0), 2^8 - 1).
    • Round x to the nearest integer, choosing the even integer if it lies halfway between two, and choosing +0 rather than -0.
    • Return the IDL octet value that represents the same numeric value as x.
  • Error is thrown for out-of-bounds accesses.

  • When using set() the arrays may refer to the same underlying array and the write source and destination may overlap. Must handle as if a temporary copy was made, i.e. like memmove().

  • DataView and Node.js buffer have similar (but not identical) methods, which can share the same underlying implementation. Endianness is specified with an argument in DataView but is implicit in Node.js buffer:

    // DataView
    setUint16(unsigned long byteOffset, unsigned short value, optional boolean littleEndian)
    
    // Node.js buffer
    buf.writeUInt16LE(value, offset, [noAssert])
    buf.writeUInt16BE(value, offset, [noAssert])
    

    Unfortunately also the argument order (value/offset) are swapped.

  • There are explicit zeroing guarantees for ArrayBuffer constructor and typedarray constructors, so buffer data must be zeroed even when DUK_USE_ZERO_BUFFER_DATA is not set.

Implementation notes

  • ArrayBuffer wraps an underlying buffer object. A buffer object can be "neutered".
  • ArrayBufferView classes and DataView refer to an underlying ArrayBuffer, and may have an offset. These could be implemented similar to Node.js Buffer: refer to a plain underlying buffer, byte offset, and byte length in internal properties. Reference to the original ArrayBuffer (boxed buffer) is unfortunately also needed, via the '.buffer' property.
  • There are a lot of classes in the typed array specification. Each class is an object, so this is rather heavyweight.
  • Should be optional and disabled by default because of footprint concerns.

Merged read/write algorithm for element access

This section describes a merged algorithm for reading and writing fields (uint8, int8, uint16, int16, etc) with the explicit read/write calls provided by DataView and Node.js Buffer. The same native code can be used with "magic" value providing flags for differences in behavior.

Virtual index properties also need handling; they can either be implemented separately or call into this algorithm.

Summary of read methods

Related methods are summarized in the table below, notes:

  • "buf.XXX" refers to Node.JS Buffer instance methods (inherited)
  • "dv.XXX" refers to Khronos DataView instance methods (inherited)
  • "XyzArray index" refers to Khronos typed array view number index reads
  • Endianness "user" means that caller gives a littleEndian flag so that effective endianness is either big or little (there's no support for ARM mixed endian)
  • Endianness "host" means that host endianness is used
  • When reading values, there's no clamping behavior because integers are converted to IEEE doubles upon read in the natural way (zeroes read out as positive zeroes).
  • Bounds "arg" means argument indicates yes/no, "yes" means bounds are checked, "n/a" means not applicable. Virtual indices don't really have bounds checking, as any reads outside the range [0,length[ just become concrete string-keyed property lookups.
Method Endian Bytes Bounds Notes
buf.readIntLE little 1-6 arg Can read up to 48-bit integers, caller specifies
buf.readIntBE big 1-6 arg Can read up to 48-bit integers, caller specifies
buf.readUIntLE little 1-6 arg Can read up to 48-bit integers, caller specifies
buf.readUIntBE big 1-6 arg Can read up to 48-bit integers, caller specifies
buf.readInt8 n/a 1 arg  
buf.readUInt8 n/a 1 arg  
buf.readInt16LE little 2 arg  
buf.readInt16BE big 2 arg  
buf.readUInt16LE little 2 arg  
buf.readUInt16BE big 2 arg  
buf.readInt32LE little 4 arg  
buf.readInt32BE big 4 arg  
buf.readUInt32LE little 4 arg  
buf.readUInt32BE big 4 arg  
buf.readFloatLE little 4 arg  
buf.readFloatBE big 4 arg  
buf.readDoubleLE little 8 arg  
buf.readDoubleBE big 8 arg  
DataView.getInt8 n/a 1 yes  
DataView.getUint8 n/a 1 yes  
DataView.getInt16 user 2 yes  
DataView.getUint16 user 2 yes  
DataView.getInt32 user 4 yes  
DataView.getUint32 user 4 yes  
DataView.getFloat32 user 4 yes  
DataView.getFloat64 user 8 yes  
Int8Array index n/a 1 n/a  
Uint8Array index n/a 1 n/a  
Uint8ClampedArray index n/a 1 n/a  
Int16Array index host 2 n/a  
Uint16Array index host 2 n/a  
Int32Array index host 4 n/a  
Uint32Array index host 4 n/a  
Float32Array index host 4 n/a  
Float64Array index host 8 n/a  

Summary of write methods

Related methods are summarized in the table below, notes:

  • "buf.XXX" refers to Node.JS Buffer instance methods (inherited)
  • "dv.XXX" refers to Khronos DataView instance methods (inherited)
  • "XyzArray index" refers to Khronos typed array view number index writes
  • Endianness "user" means that caller gives a littleEndian flag so that effective endianness is either big or little (there's no support for ARM mixed endian)
  • Endianness "host" means that host endianness is used
  • Coercion behavior describes how an input value is coerced into an integer value; usually truncation but there are special cases. "truncate*" means that truncation happens in Node.js Buffer API calls when "noAssert==true"; a TypeError occurs for out-of-range writes (though fractional values are still silently accepted).
  • Bounds "arg" means argument indicates yes/no, "yes" means bounds are checked, "n/a" means not applicable. Virtual indices don't really have bounds checking, as any writes outside the range [0,length[ just become concrete string-keyed properties of the object (provided the object is extensible).
  • Return value of Node.js Buffer write calls is the number of bytes written. TypedArray write return value is undefined.
  • Node.js Buffer write() method is left out because it's not an element write
Method Endian Bytes Bounds Coercion Notes
buf.writeIntLE little 1-6 arg truncate* Can write up to 48-bit integers, caller specifies
buf.writeIntBE big 1-6 arg truncate* Can write up to 48-bit integers, caller specifies
buf.writeUIntLE little 1-6 arg truncate* Can write up to 48-bit integers, caller specifies
buf.writeUIntBE big 1-6 arg truncate* Can write up to 48-bit integers, caller specifies
buf.writeInt8 n/a 1 arg truncate*  
buf.writeUInt8 n/a 1 arg truncate*  
buf.writeInt16LE little 2 arg truncate*  
buf.writeInt16BE big 2 arg truncate*  
buf.writeUInt16LE little 2 arg truncate*  
buf.writeUInt16BE big 2 arg truncate*  
buf.writeInt32LE little 4 arg truncate*  
buf.writeInt32BE big 4 arg truncate*  
buf.writeUInt32LE little 4 arg truncate*  
buf.writeUInt32BE big 4 arg truncate*  
buf.writeFloatLE little 4 arg truncate*  
buf.writeFloatBE big 4 arg truncate*  
buf.writeDoubleLE little 8 arg truncate*  
buf.writeDoubleBE big 8 arg truncate*  
DataView.setInt8 n/a 1 yes truncate  
DataView.setUint8 n/a 1 yes truncate  
DataView.setInt16 user 2 yes truncate  
DataView.setUint16 user 2 yes truncate  
DataView.setInt32 user 4 yes truncate  
DataView.setUint32 user 4 yes truncate  
DataView.setFloat32 user 4 yes truncate  
DataView.setFloat64 user 8 yes truncate  
Int8Array index n/a 1 n/a truncate  
Uint8Array index n/a 1 n/a truncate  
Uint8ClampedArray index n/a 1 n/a special Coercion is rounding with specific rules
Int16Array index host 2 n/a truncate  
Uint16Array index host 2 n/a truncate  
Int32Array index host 4 n/a truncate  
Uint32Array index host 4 n/a truncate  
Float32Array index host 4 n/a truncate  
Float64Array index host 8 n/a truncate  

Implementation notes

TypedArray inheritance

The prototype chain for a TypedArray instance in V8 is:

view object -> Uint8Array.prototype -> Object.prototype

This means that view properties like set() and subarray() are provided by the prototype, and each view type has its own prototype with these properties. This duplicates the properties several times.

Duktape now inherits from an intermediate object:

view object -> Uint8Array.prototype -> TypedArray prototype -> Object.prototype

The set() and subarray() methods are inherited from the intermediate prototype object. This reduces property count by about 16 at the cost of one additional object.

ES2015 makes this the standard model; the TypedArreay prototype is referred to as %TypedArrayPrototype% intrinsic object in the ES2015 specification.

View/slice notes

  • Affects all code that accesses the underlying buffer through an Object reference (Buffer, ArrayBuffer, DataView, Uint8Array, etc):
    • Must look up internal plain buffer but also check for offset/length information.
    • Lookups should be fast, so:
      • Use an extended structure like for compiled functions
      • Use slotted internal properties (must be non-configurable so that their location won't change by accident)
  • Need reference to underlying buffer:
    • Could use a raw pointer to the buffer data as long as there's also a buffer reference to avoid freeing the underlying data.
    • But a raw pointer would only work with a fixed buffer which has a stable buffer pointer.
    • So, must reference the original buffer and figure out its data area dynamically.
  • Need byte offset and length for the view:
    • These should be validated on creation so that sanity checks are not necessary for every access.
    • If internal properties, should be non-writable and non-configurable to ensure that only C code can create a situation where assertions fail.
  • Need element size for the view:
    • For Node.js Buffer the element size is the byte size. For TypedArrays it may be 1, 2, 4, or 8 bytes.
    • Virtual "length" property must provide length in elements. Maintain two length fields (byte and element) or only the other and shift as necessary.
    • Virtual element "length": easier index/bound checks, virtual "length" read needs no change. Must be taken into account when byte length is needed.

Buffer validity checksand unbacked buffers

To ensure memory safety, all memory accesses need to be checked against the size of the underlying buffer even if the access is within the configured view/slice. This is needed because an underlying buffer may be dynamic or external and can be resized/reconfigured at any point.

In particular, the underlying buffer may be resized as a side effect of any operation that triggers code to run: the code may call into user code which manipulates the buffer.

As a result, the following checks must be made just before an operation and there must be no side effects between the check and the operation:

  • Checking that byte range is covered by underlying buffer
  • Checking that bufferobject is neutered (buf == NULL vs. buf != NULL)

Future work

Missing ES2015 features

General semantics:

  • ToLength() coercion allows ArrayBuffer and typed array length up to 2^53 - 1.
  • Virtual index getters/setters don't handle out-of-bound accesses correctly (they should not be inherited through the inheritance chain).
  • Behavior for "detached" ArrayBuffers don't necessarily implement the behavior described in http://www.ecma-international.org/ecma-262/6.0/#sec-properties-of-the-arraybuffer-instances: "... all operators to access or modify data contained in the ArrayBuffer instance will fail." However, there's no support for creating a detached buffer now, so this doesn't really matter.
  • Coercion behavior may be correct, but needs to be checked for typed arrays and DataView.

ArrayBuffer:

  • ArrayBuffer.prototype[@@toStringTag] missing.

DataView:

  • DataView.prototype.buffer is an accessor property, currently .buffer is a concrete property.
  • DataView.prototype[@@toStringTag] missing.

Typed arrays:

  • Typed array constructors like Uint8Array should inherit from an unnamed prototype object which hosts shared properties like .from().
  • %TypedArray%.from missing.
  • %TypedArray%.of missing.
  • %TypedArray%.prototype is %TypedArrayPrototype% which Duktape is actually already using.
  • The .buffer property should be inherited from %TypedArray%.prototype.buffer instead of being a concrete property.
  • The .byteLength property should be inherited, but is virtual. The difference matters if inheritance relationship is altered.
  • The .byteOffset property should be inherited, but is virtual.
  • The .length property should be inherited, but is virtual.
  • %TypedArray%.prototype.copyWithin() is missing.
  • %TypedArray%.prototype.entries() is missing.
  • %TypedArray%.prototype.every() is missing.
  • %TypedArray%.prototype.fill() is missing.
  • %TypedArray%.prototype.filter() is missing.
  • %TypedArray%.prototype.find() is missing.
  • %TypedArray%.prototype.findIndex() is missing.
  • %TypedArray%.prototype.forEach() is missing.
  • %TypedArray%.prototype.indexOf() is missing.
  • %TypedArray%.prototype.join() is missing.
  • %TypedArray%.prototype.keys() is missing.
  • %TypedArray%.prototype.lastIndexOf() is missing.
  • %TypedArray%.prototype.map() is missing.
  • %TypedArray%.prototype.reduce() is missing.
  • %TypedArray%.prototype.reduceRight() is missing.
  • %TypedArray%.prototype.reverse() is missing.
  • %TypedArray%.prototype.set() exists, semantics need to be checked.
  • %TypedArray%.prototype.slice() is missing.
  • %TypedArray%.prototype.some() is missing.
  • %TypedArray%.prototype.sort() is missing.
  • %TypedArray%.prototype.subarray() exists, semantics need to be checked.
  • %TypedArray%.prototype.toLocaleString() is missing.
  • %TypedArray%.prototype.toString() is missing.
  • %TypedArray%.prototype.values() is missing.
  • %TypedArray%.prototype[@@iterator]() is missing.
  • get %TypedArray%.prototype[@@toStringTag] is missing.

The initial implementations for some of the missing methods can be the equivalent methods in Array.prototype with the caveat that .length should be accessed directly without invoking side effects. For now this would not be an issue because typed array .length is a virtual own property, and accessing it has no side effects.

Update to newer Node.js Buffer API version

Current:

Latest at time of writing:

Gap between current implementation and latest:

  • Buffers can be for-of iterated; buf.values(), buf.keys(), and buf.entries() create iterators. For-of iteration requires @@iterator support.

  • new Buffer(...) is deprecated in favor of Buffer.from(), Buffer.alloc(), and Buffer.allocUnsafe().

  • new Buffer(arrayBuffer[, byteOffset [, length]]) variant is not supported. This variant is already deprecated.

  • new Buffer(string[, encoding]) does not handle encoding correctly. The internal string representation (CESU-8, extended UTF-8, or even invalid UTF-8) is used as buffer bytes as-is. This is also incorrect for Node.js Buffer v0.12.1 (and already incorrect in master).

  • Buffer.alloc(size[, fill[, encoding]]) is missing.

  • Buffer.allocUnsafe(size) is missing. In practice could be implemented as Buffer.alloc(size) (ignoring any further parameters) so that a zero-filled buffer is allocated.

  • Buffer.allocUnsafeSlow(size) is missing. Could be implemented as Buffer.alloc(size) (ignoring any further parameters).

  • Buffer.byteLength(string[, encoding]) ignores the encoding argument and just returns the byte length of the internal string representation (CESU-8 typically, but not always). It also doesn't handle buffer, Uint8Array, etc values which now have special handling in v6.9.1.

  • Buffer.from() is missing. It can share most code with the constructor.

  • Buffer.isEncoding() is implemented but (still) only narrowly recognizes the exact string "utf8".

  • Buffer.poolSize is not provided. The value is meaningless if it's not used by the implementation (i.e. no "unsafe" buffers are implemented).

  • SlowBuffer is not implemented; it's part of v0.12.1 and deprecated (but present) in v6.9.1. If deprecated features are supported, it should be implemented.

  • buf.compare() has additional arguments in v6.9.1 (source/target indices) which are not implemented.

  • buf.copy() return value has been specified explicitly, must compare against current implementation (also for truncated copys).

  • buf.entries() missing. Depends on general iterator support.

  • buf.fill() has an explicit encoding argument which has an effect if the fill argument is a string. Depends on generally supporting string encodings for Buffer API.

  • buf.indexOf() missing. Note that this is not the same as the typed array indexOf() because it recognizes Buffers.

  • buf.includes() missing. Note that this is not the same as typed array includes().

  • buf.keys() missing.

  • buf.lastIndexOf() missing. Note that this is not the same as typed array lastIndexOf().

  • buf.length writability comments in v6.9.1 may need documentation.

  • buf.readDoubleBE(), buf.writeDoubleBE() and all the other read/write accessors seem to be the same in v6.9.1. Duktape doesn't implement the noAssert argument and always checks the offsets (which should be within the specification because:

    Setting noAssert to true allows offset to be beyond the end of buf, but the result should be considered undefined behavior.

  • buf.swap16(), buf.swap32(), buf.swap64() missing.

  • buf.toString() always decodes the buffer using UTF-8 (with replacement characters for invalid sequences), and ignores the encoding argument.

  • buf.values() missing.

  • buf.write() doesn't implement encoding. In both v0.12.1 and v6.9.1 partially encoded characters won't be written at all so that a few bytes at the end of the buffer may (apparently) be left untouched on a truncated write. Duktape doesn't currently implement this behavior.

  • buffer.INSPECT_MAX_BYTES not implemented. It's a property on the require('buffer') module rather than Buffer or a Buffer instance.

  • SlowBuffer is not implemented.

Other notes:

  • Deprecated features could be moved behind a config option, e.g. DUK_USE_NODEJS_BUFFER_DEPRECATED.
  • Node.js Buffer binding should have its own config option, e.g. DUK_USE_NODEJS_BUFFER.

Improve consistency of argument coercion

For Node.js Buffer bindings there's considerable variation of how arguments are coerced (in both Node.js and Duktape; and these are not always the same now). Improve consistency either by matching Node.js more closely, or by making Duktape specific behavior more consistent with itself.

Add support for neutering (detached buffer)

Currently not supported. Neutering an ArrayBuffer must also affect all views referencing that ArrayBuffer. Because duk_hbufobj has a direct duk_hbuffer pointer (not a pointer to ArrayBuffer which is stored as .buffer) the neutering cannot be implemented by replacing the duk_hbuffer pointer with zero, as that wouldn't affect all the shared views.

Instead, neutering probably needs to be implemented at the plain buffer level; for example, by adding a "neutered" flag to duk_hbuffer. A dynamic buffer can also be resized to zero bytes at neutering time.

Another option is to support neutering only when the underlying buffer is dynamic, and simply resize the buffer to zero bytes. This produces much of the required behavior (e.g. zero .byteLength) but not all (e.g. zero .byteOffset). So an explicit neutered check, or a change in data structures, may be necessary.

In ES2015 neutering seems to be covered under the name "detached buffer" and many operations on detached buffers (like reads and writes) throw a TypeError which is close to what current code is doing:

Configurable endianness for TypedArray views

Change duk_hbufobj so that it records requested endianness explicitly: host, little, or big endian. Then use the specified endianness in readfield and writefield internal primitives.

This should be relatively straightforward to do, and perhaps useful.

Allow non-aligned views

The ES2015 alignment limitation is not necessary with Duktape because all element accesses are ultimately done using byte-by-byte reads without making any alignment assumptions.

Additional arguments to TypedArray .set()

It would be nice to be able to specify an offset/length (or offset/end) for a .set() call, so that one could:

v1.set(v2, 5, 10);

Currently one needs to do something like:

v1.set(v2.subarray(5, 15));

Additional arguments to TypedArray constructor

It would be nice to have offset/length when constructing a TypedArray from another TypedArray.

Node.js .parent property

Not currently included in Node.js Buffer instances.

Testcase coverage improvements

  • Fine-grained tests for argument/this coercion
  • Property attributes
  • Object.defineProperty() and Object.getOwnPropertyDescriptor() for virtual properties
  • Constructing DataView and TypedArray from another view (allowed now but semantics may need improvement)
  • Node.js Buffer slice() coverage, argument coercion, etc.

Low memory support

Implement low-memory support (16-bit fields, pointer compression, etc) for Buffer objects. Currently buffer objects will have "long" fields.

Improve fastint support

Improve fastint handling for buffer indices, lengths, values, etc.

Unsorted future work

  • Clean up duk_hbufobj buf == NULL handling. Perhaps don't allow NULL at all; this depends on the neutering / detached buffer solution.
  • Implement and test for integer arithmetic wrap checks e.g. when coercing an index into a byte offset by shifting.
  • duk_to_buffer(): coerce a Buffer object into a plain buffer value (similarly to how duk_to_string() coerces a String to a plain string)? Slice information will be lost unless a copy is made.
  • duk_is_buffer(): return true for a Buffer object? For comparison, duk_is_string() returns false for a String object, so returning false might be most consistent.
  • Other Duktape C API changes to interact with Buffer objects.
  • Node.js Buffer.isBuffer(): what is the best behavior for plain buffer and other buffer object values?
  • What to do with Node.js SlowBuffer, INSPECT_MAX_BYTES, and code that does require('buffer')?
  • Mixing buffer types between APIs: go through the various cases, document, add testcases, etc.
  • Implement fast path for Node.js Buffer constructor when argument is another duk_hbufobj (now reads indexed properties explicitly).
  • Duktape C API tests for buffer handling.
  • Duktape C API test exercising "underlying buffer doesn't cover logical buffer slice" cases which cannot be exercised with plain ECMAScript code.
  • Document Buffer object relationship to JSON, JX, and JC.
  • Explicit maximum element and byte size checks for all operations that create new bufferobjects.
  • Change the TypedArray subarray() implementation to avoid copying the argument internal prototype and use a "default" prototype instead (e.g. Uint8Array.prototype instead of copying the argument internal prototype which may be different).