Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downstream all aarch64-related patches from vanilla LuaJIT repo #5629

Closed
6 tasks done
igormunkin opened this issue Dec 15, 2020 · 1 comment
Closed
6 tasks done

Downstream all aarch64-related patches from vanilla LuaJIT repo #5629

igormunkin opened this issue Dec 15, 2020 · 1 comment

Comments

@igormunkin
Copy link
Collaborator

igormunkin commented Dec 15, 2020

There are several issues blocking Tarantool from being used on arm64. One of the showstoppers is #2712. As a first step to get closer to arm64 stability is syncing all aarch64-related patches from vanilla LuaJIT repo according to the procedure developed in scope of #5534.

There is the list of related issues:

@igormunkin igormunkin removed the tmp label Jan 12, 2021
@kyukhin kyukhin added this to the 2.8.1 milestone Feb 12, 2021
@kyukhin kyukhin added the tmp label Feb 20, 2021
@kyukhin kyukhin modified the milestones: 2.8.1, 2.9.1 Feb 20, 2021
igormunkin added a commit that referenced this issue Apr 27, 2021
This patch fixes inaccuracy in Tarantool build configuration introduced
by commit 07c83aa ('build: adjust
LuaJIT build system'). All those MacOS-related tweaks for __PAGEZERO
size and preferred load address for the bundle are necessary only for
builds with 32-bit GC area on 64-bit host. The only case fitting these
conditions is x86_64 with no LUAJIT_ENABLE_GC64. All other 64-bit builds
use 64-bit GC area unconditionally.

Part of #5983
Needed for #5629
Follows up #4862

Signed-off-by: Igor Munkin <[email protected]>
igormunkin added a commit that referenced this issue Apr 28, 2021
This patch fixes inaccuracy in Tarantool build configuration introduced
by commit 07c83aa ('build: adjust
LuaJIT build system'). All those MacOS-related tweaks for __PAGEZERO
size and preferred load address for the bundle are necessary only for
builds with 32-bit GC area on 64-bit host. The only case fitting these
conditions is x86_64 with no LUAJIT_ENABLE_GC64. All other 64-bit builds
use 64-bit GC area unconditionally.

Part of #5983
Needed for #5629
Follows up #4862

Reviewed-by: Sergey Kaplun <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin added a commit that referenced this issue Apr 28, 2021
This patch fixes inaccuracy in Tarantool build configuration introduced
by commit 07c83aa ('build: adjust
LuaJIT build system'). All those MacOS-related tweaks for __PAGEZERO
size and preferred load address for the bundle are necessary only for
builds with 32-bit GC area on 64-bit host. The only case fitting these
conditions is x86_64 with no LUAJIT_ENABLE_GC64. All other 64-bit builds
use 64-bit GC area unconditionally.

Part of #5983
Needed for #5629
Follows up #4862

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Nikita Pettik <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin added a commit to tarantool/luajit that referenced this issue Apr 28, 2021
igormunkin pushed a commit to tarantool/luajit that referenced this issue Apr 28, 2021
@igormunkin igormunkin self-assigned this Apr 28, 2021
igormunkin added a commit that referenced this issue Apr 29, 2021
This patch fixes inaccuracy in Tarantool build configuration introduced
by commit 07c83aa ('build: adjust
LuaJIT build system'). All those MacOS-related tweaks for __PAGEZERO
size and preferred load address for the bundle are necessary only for
builds with 32-bit GC area on 64-bit host. The only case fitting these
conditions is x86_64 with no LUAJIT_ENABLE_GC64. All other 64-bit builds
use 64-bit GC area unconditionally.

Part of #5983
Needed for #5629
Follows up #4862

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Nikita Pettik <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
@kyukhin kyukhin removed the tmp label Apr 30, 2021
igormunkin added a commit that referenced this issue Apr 30, 2021
This patch fixes inaccuracy in Tarantool build configuration introduced
by commit 07c83aa ('build: adjust
LuaJIT build system'). All those MacOS-related tweaks for __PAGEZERO
size and preferred load address for the bundle are necessary only for
builds with 32-bit GC area on 64-bit host. The only case fitting these
conditions is x86_64 with no LUAJIT_ENABLE_GC64. All other 64-bit builds
use 64-bit GC area unconditionally.

Part of #5983
Needed for #5629
Follows up #4862

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Nikita Pettik <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin added a commit that referenced this issue Apr 30, 2021
This patch fixes inaccuracy in Tarantool build configuration introduced
by commit 07c83aa ('build: adjust
LuaJIT build system'). All those MacOS-related tweaks for __PAGEZERO
size and preferred load address for the bundle are necessary only for
builds with 32-bit GC area on 64-bit host. The only case fitting these
conditions is x86_64 with no LUAJIT_ENABLE_GC64. All other 64-bit builds
use 64-bit GC area unconditionally.

Part of #5983
Needed for #5629
Follows up #4862

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Nikita Pettik <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
(cherry picked from commit e50a6d9)
igormunkin pushed a commit to tarantool/luajit that referenced this issue May 3, 2021
(cherry picked from commit 2e2fb8f)

Part of tarantool/tarantool#5629
Relates to tarantool/tarantool#5983

Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue May 3, 2021
Thanks to Igor Munkin.

(cherry picked from commit 521b367)

Part of tarantool/tarantool#5629
Relates to tarantool/tarantool#5983

Signed-off-by: Igor Munkin <[email protected]>
igormunkin added a commit that referenced this issue May 3, 2021
Since commit c9d88d5 ('Fix #984: add
jit.* library to the binary') all required modules implemented in Lua
are bundled (i.e. compiled to the binary as a C literal) into Tarantool
executable. To save the memory footprint (this is the only reason I can
imagine as a rationale) Lua sources related to unsupported platforms are
not bundled. While making Tarantool work on ARM64 hosts, it turned out
the module specific for this arch (i.e. jit/dis_arm64.lua) is missing.
As a result of this patch, <jit.dump> loads fine on ARM64 platforms.

Part of #5983
Relates to #5629
Follows up #984

Signed-off-by: Igor Munkin <[email protected]>
Buristan pushed a commit to tarantool/luajit that referenced this issue Sep 8, 2021
Reported by XmiliaH.

(cherry picked from commit 16d38a4)

This patch fixes the regression introduced in scope of
fa8e7ffefb715abf55dc5b0c708c63251868 ('Add support for full-range
64 bit lightuserdata.').

The maximum available number of lightuserdata segment is 255. So the
high bits of this lightuserdata TValue are 0xfffe7fff. The same high
bits are set for special control variable on the stack for ITERN/ITERC
bytecodes via ISNEXT bytecode. When ITERN bytecode is despecialize to
ITERC bytecode and a table has the lightuserdata with the maximum
available segment number as a key, the special control variable is
considered as this key and iteration is broken.

This patch forbids to use more than 254 lightuserdata segments to
avoid clashing with the aforementioned control variable. In case when
user tries to create lightuserdata with 255th segment number an error
"bad light userdata pointer" is raised.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#5629
Buristan pushed a commit to tarantool/luajit that referenced this issue Sep 8, 2021
Reported by XmiliaH.

(cherry picked from commit 16d38a4)

This patch fixes the regression introduced in scope of
fa8e7ffefb715abf55dc5b0c708c63251868 ('Add support for full-range
64 bit lightuserdata.').

The maximum available number of lightuserdata segment is 255. So the
high bits of this lightuserdata TValue are 0xfffe7fff. The same high
bits are set for special control variable on the stack for ITERN/ITERC
bytecodes via ISNEXT bytecode. When ITERN bytecode is despecialize to
ITERC bytecode and a table has the lightuserdata with the maximum
available segment number as a key, the special control variable is
considered as this key and iteration is broken.

This patch forbids to use more than 254 lightuserdata segments to
avoid clashing with the aforementioned control variable. In case when
user tries to create lightuserdata with 255th segment number an error
"bad light userdata pointer" is raised.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#5629
Buristan pushed a commit to tarantool/luajit that referenced this issue Sep 8, 2021
Reported by XmiliaH.

(cherry picked from commit 16d38a4)

This patch fixes the regression introduced in scope of
fa8e7ffefb715abf55dc5b0c708c63251868 ('Add support for full-range
64 bit lightuserdata.').

The maximum available number of lightuserdata segment is 255. So the
high bits of this lightuserdata TValue are 0xfffe7fff. The same high
bits are set for special control variable on the stack for ITERN/ITERC
bytecodes via ISNEXT bytecode. When ITERN bytecode is despecialize to
ITERC bytecode and a table has the lightuserdata with the maximum
available segment number as a key, the special control variable is
considered as this key and iteration is broken.

This patch forbids to use more than 254 lightuserdata segments to
avoid clashing with the aforementioned control variable. In case when
user tries to create lightuserdata with 255th segment number an error
"bad light userdata pointer" is raised.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#5629
Buristan pushed a commit to tarantool/luajit that referenced this issue Sep 20, 2021
This patch only performs a code movement of lightuserdata interning to
<lj_udata.c> file and does nothing else. This patch is backported to
simplify syncing with the upstream.

Sergey Kaplun:
* added the description for the patch

Needed for tarantool/tarantool#5629
Buristan pushed a commit to tarantool/luajit that referenced this issue Sep 20, 2021
Reported by XmiliaH.

(cherry picked from commit 16d38a4)

This patch fixes the regression introduced in scope of
fa8e7ffefb715abf55dc5b0c708c63251868 ('Add support for full-range
64 bit lightuserdata.').

The maximum available number of lightuserdata segment is 255. So the
high bits of this lightuserdata TValue are 0xfffe7fff. The same high
bits are set for special control variable on the stack for ITERN/ITERC
bytecodes via ISNEXT bytecode. When ITERN bytecode is despecialize to
ITERC bytecode and a table has the lightuserdata with the maximum
available segment number as a key, the special control variable is
considered as this key and iteration is broken.

This patch forbids to use more than 254 lightuserdata segments to
avoid clashing with the aforementioned control variable. In case when
user tries to create lightuserdata with 255th segment number an error
"bad light userdata pointer" is raised.

Sergey Kaplun:
* added the description and the test for the problem

Part of tarantool/tarantool#5629
@kyukhin kyukhin modified the milestones: 2.10.0-rc1, 2.10.1 Dec 30, 2021
@kyukhin kyukhin modified the milestones: 2.10.0, 2.11.0 Jun 1, 2022
@Buristan
Copy link
Collaborator

Buristan commented Jun 2, 2022

This issue was about ARM64-related show stoppers found via Tarantool CI. Further backporting activity will proceed within #6548 and #7230.

@Buristan Buristan closed this as completed Jun 2, 2022
igormunkin added a commit to tarantool/luajit that referenced this issue Jun 16, 2022
There were issues with configuring LuaJIT on Apple machines, since
<LuaJITTestArch> CMake auxiliary routine fails to locate system headers
(e.g. assert.h in case when LUA_USE_ASSERT is enabled). As a result
platform detection fails and LuaJIT configuration ends with the fatal
error. This patch adds the necessary flags to help the routine to find
the required system headers.

Needed for tarantool/tarantool#6065
Relates to tarantool/tarantool#5629
Follows up tarantool/tarantool#4862

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
(cherry picked from commit 2e2fb8f)

After Apple released Macs working on ARM64, the previous recipe in
lj_arch.h for detecting various Apple platforms is not valid anymore.
Fortunately, there is a system header (namely, TargetConditionals.h),
provided by SDK with the proper defines to be set. Starting from this
patch, LuaJIT identifies Apple hosts via this header.

Since testing machinery assumes that LuaJIT is built with JIT support
being enabled unconditionally, a smoke test for it is also added
alongside with this patch.

Igor Munkin:
* added the description and the test for the problem
* backported the original patch to tarantool/luajit repo

Resolves tarantool/tarantool#6065
Part of tarantool/tarantool#5629
Relates to tarantool/tarantool#5983

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Thanks to Igor Munkin.

(cherry picked from commit 521b367)

This patch fixes the issue introduced by commit
2e2fb8f ('OSX/iOS: Handle iOS simulator
and ARM64 Macs.'). Within the mentioned commit LJ_TARGET_IOS define is
set via Apple system header to enable several features (e.g. JIT and
external unwinder) on ARM64 Macs, but its usage was not adjusted
source-wide. This is done for FFI machinery within this commit.

All LJ_TARGET_IOS uses in FFI sources are done with LJ_TARGET_ARM64
define being set, so we can simply replace these occurrences with
LJ_TARGET_OSX.

Igor Munkin:
* added the description and the test for the problem

Resolves tarantool/tarantool#6066
Part of tarantool/tarantool#5629
Relates to tarantool/tarantool#5983

Reported-by: Nikita Pettik <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Thanks to Javier Guerra Giraldez.

(cherry picked from commit ae20998)

This patch fixes the issue introduced by commits
f307d0a ('ARM64: Add build
infrastructure and initial port of interpreter.') for arm64 and
73ef845 ('Add special bytecodes for
builtins.') for arm and ppc. Within the mentioned commits the new
bytecode TSETR is introduced for the corresponding architectures.

When the new index of the table processed during this bytecode is the
integer, that is greater than asize of the table, the VM fallbacks to
vmeta_tsetr, for calling
lj_tab_setinth(lua_State *L, GCtab *t, int32_t key). The first argument
CARG1 is not set in VM to the Lua thread being executed and contains an
invalid value, so the mentioned call leads to crash.
This patch adds the missed set of CARG1 to the right value.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#6084
Part of tarantool/tarantool#5629

Reviewed-by: Sergey Ostanevich <[email protected]>
Reviewed-by: Igor Munkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
(cherry picked from commit e9af1ab)

LuaJIT uses special NaN-tagging technique to store internal type on
the Lua stack. In case of LJ_GC64 the first 13 bits are set in special
NaN type (0xfff8...). The next 4 bits are used for an internal LuaJIT
type of object on stack. The next 47 bits are used for storing this
object's content. For userdata, it is its address. For arm64 a pointer
can have more than 47 significant bits [1]. In this case the error BADLU
error is raised.

For the support of full 64-bit range lightuserdata pointers two new
fields in GCState are added:

`lightudseg` - vector of segments of lightuserdata. Each element keeps
32-bit value. 25 MSB equal to MSB of lightuserdata 64-bit address, the
rest are filled with zeros. The length of the vector is power of 2.

`lightudnum` - the length - 1 of aforementioned vector (up to 255).

When lightuserdata is pushed on the stack, if its segment is not stored
in vector new value is appended to of this vector. The maximum amount of
segments is 256. BADLU error is raised in case when user tries to add
userdata with the new 257-th segment, so the whole VA-space isn't
covered by this patch.

Also, in this patch all internal usage of lightuserdata (for hooks,
profilers, built-in package, IR and so on) is changed to special values
on Lua Stack.

Also, conversion of TValue to FFI C type with store is no longer
compiled for lightuserdata.

[1]: https://www.kernel.org/doc/html/latest/arm64/memory.html

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#2712
Needed for tarantool/tarantool#6154
Part of tarantool/tarantool#5629

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
This reduces overall performance on ARM64, but we have no choice.
Linux kernel default userspace VA is 48 bit, but we'd need 47 bit.
mremap() ignores address hints due to a kernel API issue. The mapping
may move to an undesired address which will cause an assert or crash.

Reported by Raymond W. Ko.

(cherry picked from commit 67dbec8)

47-bit VA space is required by LuaJIT for keeping a GC object pointer in
TValue. In case of huge blobs that are mapped directly, `mremap()` may
move the chunk out of 47-bit range of VA space on ARM64. `mremap()`
accepts the fifth argument (new address hint) only with MREMAP_FIXED
flag. In that case it unmaps any other mapping to specified address.

To avoid this behaviour this patch restricts `mremap()` to relocate
the mapping to a new virtual address by setting CALL_MREMAP_NOMOVE flag
instead of CALL_MREMAP_MAYMOVE for arm64 architecture.

Sergey Kaplun:
* added the description and the test for the problem

Needed for tarantool/tarantool#6154
Part of tarantool/tarantool#5629

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Contributed by Javier Guerra Giraldez.

(cherry picked from commit c785131)

Closed upvalues are never gray. Hence, when closed upvalue is marked, it
is marked as black. Black objects can't refer white objects, so for
storing a white value in a closed upvalue, we need to move the barrier
forward and color our value to gray by using `lj_gc_barrieruv()`. This
function can't be called on closed upvalues with non-white values since
there is no need to mark it again.

USETS bytecode for arm64 architecture has the incorrect NZCV condition
flag value in the instruction that checks the upvalue is closed:
| tst TMP1w, #LJ_GC_WHITES
| ccmp TMP0w, #0, #0, ne
| beq <1 // branch out from barrier movement
`TMP0w` contains `upvalue->closed` field, so the upvalue is open if this
field equals to zero (the first one in `ccmp`). The second zero is the
value of NZCV condition flags[1] yielded if the specified condition
(`ne`) is met for the current values of the condition flags[2]. Hence,
if the value to be stored is not white (`TMP1w` holds its color), then
the condition is FALSE and all flags bits are set to zero so the branch
is not taken (Zero flag is not set). If this happens at propagate or
atomic GC phase, the `lj_gc_barrieruv()` function is called and the gray
value to be set is marked like if it is white. That leads to the
assertion failure in the `gc_mark()` function.

This patch changes NZCV condition flag to 4 (Zero flag is set) to take
the correct branch after `ccmp` instruction.

Sergey Kaplun:
* added the description and the test for the problem

[1]: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes
[2]: https://developer.arm.com/documentation/dui0801/g/pge1427897656225

Part of tarantool/tarantool#5629

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
(cherry picked from commit 2f3f078)

When LuaJIT is built with LJ_FR2 (e.g. with GC64 mode enabled),
information about frame takes two slots -- the first takes the TValue
with the function to be called, the second takes the framelink. The JIT
recording machinery does pretty the same -- the function IR_KGC is
loaded in the first slot, and the second is set to TREF_FRAME value.
This value should be rewritten after return from a callee. This slot is
cleared either by return values or manually (set to zero), when there
are no values to return. The latter case is done by the next bytecode
with RA dst mode. This obliges that the destination of RA takes the next
slot after TREF_FRAME. Hence, an earlier instruction must use the
smallest possible destination register (see `lj_record_ins()` for the
details).

Bytecode emitter swaps operands for ISGT and ISGE comparisons. As a
result, the aforementioned rule for registers allocations may be
violated. When it happens for a chunk being recorded, the slot with
TREF_FRAME is not rewritten (but the next empty slot after TREF_FRAME
is). This leads to JIT slots inconsistency and assertion failure in
`rec_check_slots()` during recording of the next bytecode instruction.

This patch fixes bytecode register allocation by changing the VM
register allocation order in case of ISGT and ISGE bytecodes.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#6227
Part of tarantool/tarantool#5629

Reviewed-by: Sergey Ostanevich <[email protected]>
Reviewed-by: Igor Munkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Contributed by Javier Guerra Giraldez.

(cherry picked from commit 9da0653)

When the side trace is assembled, it is linked to its parent trace. For
this purpose, JIT runs through the parent trace mcode and updates jump
instruction targeted to the corresponding exitno. Prior to this patch,
these instructions were patched unconditionally, that leads to errors if
the jump target address is out of the value ranges specified in ARM64
references[1][2][3][4][5][6].

As a result of the patch <lj_asm_patchexit> considers value ranges of
the jump targets and updates directly only those instructions fitting
the particular jump range. Moreover, the corresponding jump in the pad
leading to <lj_vm_exit_handler> is also patched, so those instructions,
that are not updated before, targets to the linked side trace too.

Additionally, there is some refactoring of jump targets assembling in
scope of this patch.

Igor Munkin:
* added the description and the test for the problem

[1]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/B
[2]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/B-cond
[3]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/CBZ
[4]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/CBNZ
[5]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/TBZ
[6]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/TBNZ

Resolves tarantool/tarantool#6098
Part of tarantool/tarantool#5629

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Kirill Yukhin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin added a commit to tarantool/luajit that referenced this issue Jun 16, 2022
There were issues with configuring LuaJIT on Apple machines, since
<LuaJITTestArch> CMake auxiliary routine fails to locate system headers
(e.g. assert.h in case when LUA_USE_ASSERT is enabled). As a result
platform detection fails and LuaJIT configuration ends with the fatal
error. This patch adds the necessary flags to help the routine to find
the required system headers.

Needed for tarantool/tarantool#6065
Relates to tarantool/tarantool#5629
Follows up tarantool/tarantool#4862

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
(cherry picked from commit 2e2fb8f)

After Apple released Macs working on ARM64, the previous recipe in
lj_arch.h for detecting various Apple platforms is not valid anymore.
Fortunately, there is a system header (namely, TargetConditionals.h),
provided by SDK with the proper defines to be set. Starting from this
patch, LuaJIT identifies Apple hosts via this header.

Since testing machinery assumes that LuaJIT is built with JIT support
being enabled unconditionally, a smoke test for it is also added
alongside with this patch.

Igor Munkin:
* added the description and the test for the problem
* backported the original patch to tarantool/luajit repo

Resolves tarantool/tarantool#6065
Part of tarantool/tarantool#5629
Relates to tarantool/tarantool#5983

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Thanks to Igor Munkin.

(cherry picked from commit 521b367)

This patch fixes the issue introduced by commit
2e2fb8f ('OSX/iOS: Handle iOS simulator
and ARM64 Macs.'). Within the mentioned commit LJ_TARGET_IOS define is
set via Apple system header to enable several features (e.g. JIT and
external unwinder) on ARM64 Macs, but its usage was not adjusted
source-wide. This is done for FFI machinery within this commit.

All LJ_TARGET_IOS uses in FFI sources are done with LJ_TARGET_ARM64
define being set, so we can simply replace these occurrences with
LJ_TARGET_OSX.

Igor Munkin:
* added the description and the test for the problem

Resolves tarantool/tarantool#6066
Part of tarantool/tarantool#5629
Relates to tarantool/tarantool#5983

Reported-by: Nikita Pettik <[email protected]>
Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Thanks to Javier Guerra Giraldez.

(cherry picked from commit ae20998)

This patch fixes the issue introduced by commits
f307d0a ('ARM64: Add build
infrastructure and initial port of interpreter.') for arm64 and
73ef845 ('Add special bytecodes for
builtins.') for arm and ppc. Within the mentioned commits the new
bytecode TSETR is introduced for the corresponding architectures.

When the new index of the table processed during this bytecode is the
integer, that is greater than asize of the table, the VM fallbacks to
vmeta_tsetr, for calling
lj_tab_setinth(lua_State *L, GCtab *t, int32_t key). The first argument
CARG1 is not set in VM to the Lua thread being executed and contains an
invalid value, so the mentioned call leads to crash.
This patch adds the missed set of CARG1 to the right value.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#6084
Part of tarantool/tarantool#5629

Reviewed-by: Sergey Ostanevich <[email protected]>
Reviewed-by: Igor Munkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
(cherry picked from commit e9af1ab)

LuaJIT uses special NaN-tagging technique to store internal type on
the Lua stack. In case of LJ_GC64 the first 13 bits are set in special
NaN type (0xfff8...). The next 4 bits are used for an internal LuaJIT
type of object on stack. The next 47 bits are used for storing this
object's content. For userdata, it is its address. For arm64 a pointer
can have more than 47 significant bits [1]. In this case the error BADLU
error is raised.

For the support of full 64-bit range lightuserdata pointers two new
fields in GCState are added:

`lightudseg` - vector of segments of lightuserdata. Each element keeps
32-bit value. 25 MSB equal to MSB of lightuserdata 64-bit address, the
rest are filled with zeros. The length of the vector is power of 2.

`lightudnum` - the length - 1 of aforementioned vector (up to 255).

When lightuserdata is pushed on the stack, if its segment is not stored
in vector new value is appended to of this vector. The maximum amount of
segments is 256. BADLU error is raised in case when user tries to add
userdata with the new 257-th segment, so the whole VA-space isn't
covered by this patch.

Also, in this patch all internal usage of lightuserdata (for hooks,
profilers, built-in package, IR and so on) is changed to special values
on Lua Stack.

Also, conversion of TValue to FFI C type with store is no longer
compiled for lightuserdata.

[1]: https://www.kernel.org/doc/html/latest/arm64/memory.html

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#2712
Needed for tarantool/tarantool#6154
Part of tarantool/tarantool#5629

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
This reduces overall performance on ARM64, but we have no choice.
Linux kernel default userspace VA is 48 bit, but we'd need 47 bit.
mremap() ignores address hints due to a kernel API issue. The mapping
may move to an undesired address which will cause an assert or crash.

Reported by Raymond W. Ko.

(cherry picked from commit 67dbec8)

47-bit VA space is required by LuaJIT for keeping a GC object pointer in
TValue. In case of huge blobs that are mapped directly, `mremap()` may
move the chunk out of 47-bit range of VA space on ARM64. `mremap()`
accepts the fifth argument (new address hint) only with MREMAP_FIXED
flag. In that case it unmaps any other mapping to specified address.

To avoid this behaviour this patch restricts `mremap()` to relocate
the mapping to a new virtual address by setting CALL_MREMAP_NOMOVE flag
instead of CALL_MREMAP_MAYMOVE for arm64 architecture.

Sergey Kaplun:
* added the description and the test for the problem

Needed for tarantool/tarantool#6154
Part of tarantool/tarantool#5629

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Contributed by Javier Guerra Giraldez.

(cherry picked from commit c785131)

Closed upvalues are never gray. Hence, when closed upvalue is marked, it
is marked as black. Black objects can't refer white objects, so for
storing a white value in a closed upvalue, we need to move the barrier
forward and color our value to gray by using `lj_gc_barrieruv()`. This
function can't be called on closed upvalues with non-white values since
there is no need to mark it again.

USETS bytecode for arm64 architecture has the incorrect NZCV condition
flag value in the instruction that checks the upvalue is closed:
| tst TMP1w, #LJ_GC_WHITES
| ccmp TMP0w, #0, #0, ne
| beq <1 // branch out from barrier movement
`TMP0w` contains `upvalue->closed` field, so the upvalue is open if this
field equals to zero (the first one in `ccmp`). The second zero is the
value of NZCV condition flags[1] yielded if the specified condition
(`ne`) is met for the current values of the condition flags[2]. Hence,
if the value to be stored is not white (`TMP1w` holds its color), then
the condition is FALSE and all flags bits are set to zero so the branch
is not taken (Zero flag is not set). If this happens at propagate or
atomic GC phase, the `lj_gc_barrieruv()` function is called and the gray
value to be set is marked like if it is white. That leads to the
assertion failure in the `gc_mark()` function.

This patch changes NZCV condition flag to 4 (Zero flag is set) to take
the correct branch after `ccmp` instruction.

Sergey Kaplun:
* added the description and the test for the problem

[1]: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/condition-codes-1-condition-flags-and-codes
[2]: https://developer.arm.com/documentation/dui0801/g/pge1427897656225

Part of tarantool/tarantool#5629

Reviewed-by: Igor Munkin <[email protected]>
Reviewed-by: Sergey Ostanevich <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
(cherry picked from commit 2f3f078)

When LuaJIT is built with LJ_FR2 (e.g. with GC64 mode enabled),
information about frame takes two slots -- the first takes the TValue
with the function to be called, the second takes the framelink. The JIT
recording machinery does pretty the same -- the function IR_KGC is
loaded in the first slot, and the second is set to TREF_FRAME value.
This value should be rewritten after return from a callee. This slot is
cleared either by return values or manually (set to zero), when there
are no values to return. The latter case is done by the next bytecode
with RA dst mode. This obliges that the destination of RA takes the next
slot after TREF_FRAME. Hence, an earlier instruction must use the
smallest possible destination register (see `lj_record_ins()` for the
details).

Bytecode emitter swaps operands for ISGT and ISGE comparisons. As a
result, the aforementioned rule for registers allocations may be
violated. When it happens for a chunk being recorded, the slot with
TREF_FRAME is not rewritten (but the next empty slot after TREF_FRAME
is). This leads to JIT slots inconsistency and assertion failure in
`rec_check_slots()` during recording of the next bytecode instruction.

This patch fixes bytecode register allocation by changing the VM
register allocation order in case of ISGT and ISGE bytecodes.

Sergey Kaplun:
* added the description and the test for the problem

Resolves tarantool/tarantool#6227
Part of tarantool/tarantool#5629

Reviewed-by: Sergey Ostanevich <[email protected]>
Reviewed-by: Igor Munkin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
igormunkin pushed a commit to tarantool/luajit that referenced this issue Jun 16, 2022
Contributed by Javier Guerra Giraldez.

(cherry picked from commit 9da0653)

When the side trace is assembled, it is linked to its parent trace. For
this purpose, JIT runs through the parent trace mcode and updates jump
instruction targeted to the corresponding exitno. Prior to this patch,
these instructions were patched unconditionally, that leads to errors if
the jump target address is out of the value ranges specified in ARM64
references[1][2][3][4][5][6].

As a result of the patch <lj_asm_patchexit> considers value ranges of
the jump targets and updates directly only those instructions fitting
the particular jump range. Moreover, the corresponding jump in the pad
leading to <lj_vm_exit_handler> is also patched, so those instructions,
that are not updated before, targets to the linked side trace too.

Additionally, there is some refactoring of jump targets assembling in
scope of this patch.

Igor Munkin:
* added the description and the test for the problem

[1]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/B
[2]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/B-cond
[3]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/CBZ
[4]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/CBNZ
[5]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/TBZ
[6]: https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/TBNZ

Resolves tarantool/tarantool#6098
Part of tarantool/tarantool#5629

Reviewed-by: Sergey Kaplun <[email protected]>
Reviewed-by: Kirill Yukhin <[email protected]>
Signed-off-by: Igor Munkin <[email protected]>
@igormunkin igormunkin removed the teamL label Oct 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants