Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backslash, then Ctrl-B produces unexpected results in Emacs mode #8

Closed
nickpapadonis opened this issue Mar 24, 2020 · 30 comments
Closed

Comments

@nickpapadonis
Copy link

nickpapadonis commented Mar 24, 2020

Test case A:

  1. Type backslash
  2. Type Ctrl-B
  3. Carrot B appears
  4. The expected result is to go backwards one character, instead of printing Carrot B

There is been discussion that this behavior is expected based on p106 of Korn's book. At the same time typing backslash and Ctrl-C works as expected without printing Carrot C. Backslash Ctrl-D behaves the same as Test case A.

There is a video reproducing this at:
https://www.youtube.com/watch?v=ojfAgdnwQzw&feature=youtu.be

Notes from KSH mailing list discussion:
Kurtis: That behavior is present in the ksh93u+ release and thus is not something introduced since the most recent stable release seven years ago. Note that the unexpected behavior is due to the backslash. Without the char under the cursor being a backslash [ctrl-B] and [ctrl-F] behave as you would expect. This affects pretty much every control char such as [ctrl-A] to move to the start of the line. Whether this behavior is good or bad is debatable. If you feel it should be changed, despite being long standing behavior, please open an issue

The problem isn't the presence of parenthesis; it's the backslash. Type a\[ctrl-B]. This behavior only happens if the most recent character you typed is a backslash. So you can type a()b, then move the cursor inside the parens, type \[ctrl-B] and the same thing will happen. Personally, I think the magic backslash is a misfeature. You can always type [ctrl-V][ctrl-B] to insert a literal [ctrl-B]. Feel free to open an issue. Even better, create a patch for us to review.

On that platform /bin/ksh is 93t+ 2010-03-05 and exhibits the behavior you described. You should be able to run /bin/ksh --version to figure out which ksh variant you're using.

Having said all that I agree with you the behavior is confusing and should be changed regardless of how many years it has been in effect.

@saper
Copy link

saper commented Mar 24, 2020

@nickpapadonis There are some bits missing in the way you describe how to reproduce the issue, probably something got cut out by the markdown editor

@nickpapadonis
Copy link
Author

Fixed the comment thanks

@jghub
Copy link
Member

jghub commented Mar 25, 2020

I am using vi mode so never noticed this behaviour. I now have looked it up in the bolsky/korn book (@nickpapadonis: if you like and care for ksh I can really recommend that you get a hardcopy (long out of print, used samples you can get for 5$ or so from abebooks/amazon)) and it proofs that the behaviour you see is explicitly intended. from page 106:

\ Escapes next character. Editiing characters and the user's Erase, Kill, and interrupt characters may be entered in a command line or search string if preceded by \. The \ removes the next character's editing features, if any.

this seems clear enough to me. I disagree with krader's "Having said all that I agree with you the behavior is confusing and should be changed regardless of how many years it has been in effect."

while it is correct, that control chars like ^B might also be inserted via ^V^B this convenience feature for emacs mode users has to be kept in place and the issue then could be closed as "not-a-bug" I believe. @dannyweldon @marcastel (and everybodye;)): what do you think?

@marcastel
Copy link
Member

This is also documented with the same wording in the man page (src/cmd/ksh93/sh.1):

\
Escape next character. Editing characters, the user's erase, kill and interrupt (normally ^?) characters may be entered in a command line or in a search string if preceded by a . The \ removes the next character's editing features (if any).

I concur with @jghub in that this appears compliant. I am not an emacs user either.

@saper
Copy link

saper commented Mar 25, 2020

It also works this way in the vi mode as well without any editing mode at all.

There is one problem though - the documentation says this is a "command", so I'd expect that typing `` (backslash) and then some character quotes it. So far so fine.

However, it seems to work like a permanent marker though: just type backslash, a, b to get \ab. Then press backspace once (\a), again (\) and again (^? in my case - my erase character).

Is the "command" something that is there permanently in the line buffer or is this just a keystroke pressed?

@jghub
Copy link
Member

jghub commented Mar 25, 2020

oh, you are right regarding also happens in vi mode! it never happened to me since I have added emacs-like cursor movement keybindings to my vi mode so that I can use ^B and ^F from within vi insert mode. and in this combo \^B actually just backspaces :).

regarding your question: it seems to come about because you never leave input mode (in vi, I mean, in emacs no difference...) so you delete the b than the a. at this point your whole input is \ and the next keystroke looses its special meaning -- including del (ascii octal 177). and when you hit del the \del combination inserts the \0177 char. seems consistent, no?

@jghub
Copy link
Member

jghub commented Mar 25, 2020

so I have learned something new. I never was aware of \ doing more than protecting symbols special to the shell. I completely missed the line editor support to input control chars in this way...

@nickpapadonis
Copy link
Author

I update the issue as some characters were lost. There are many comments above. I'm confused, is this behavior a feature? What does the feature enable the user to do?

@jghub
Copy link
Member

jghub commented Mar 25, 2020

yes, it is a "feature" since explained in the book and the manpage. it can also be confusing, obviously, since the two builtin editors (vi and emacs) in ksh thus treat \ specially, not as an ordinary character.

what it allows, is to easily enter verbatim those control-chars into string definitions. if you type \^B it is interpreted as "insert ascii character with octal code 002 (stx)" (i.e. the control-B character) instead of "move cursor one character to the left". it just is not what you expected to happen ...

@nickpapadonis
Copy link
Author

What page is this described in the book? I have it. Thanks

@jghub
Copy link
Member

jghub commented Mar 25, 2020

p. 106

@saper
Copy link

saper commented Mar 26, 2020

at this point your whole input is \ and the next keystroke looses its special meaning

I have mistakenly used the word "command" which means something different in the Korn shell. This is an "edit command". "All edit commands operate from any place on the line (not just at the beginning)". So in that way '' is just a keystroke (like ^A moving the cursor) but it leaves some visual indication the next character is about to quoted.

An '' already entered in the buffer is a different thing in my opinion. It just sits there.

Regarding ^V - ksh uses this as "Display version of the shell", it works in the emacs/gmacs mode as well as in the command mode of vi mode. However, it is also a default value of the lnext capability of the terminal discipline - this means the next character (keystroke!) is about to be entered literally.

This can be changed or turned off using stty lnext ..., at least on FreeBSD, this leaves `' edit command as the only way to enter them.

@jghub
Copy link
Member

jghub commented Mar 26, 2020

An '' already entered in the buffer is a different thing in my opinion. It just sits there.

I see your point but I am not averse to the present behaviour (which also might be justified). but I only now have noted that the two editors behave different in this situation. vi does what you describe (and what is ok behaviour with me) while emacs does what you seem to prefer in this situation (just treat \ as ordinary character in the buffer when deleting)

maybe this difference is also documented and thus a 'feature' (did not check), but probably not. in which case this is simply inconsistent behaviour and it would be desirable to change it in either vi or emacs mode.

@nickpapadonis
Copy link
Author

nickpapadonis commented Mar 26, 2020

stty lnext ...

Did not resolve the issue in Debian 10

Did not resolve the issue in FreeBSD 12.1

Did I enter the correct command?

I am still unclear on how this feature is useful. What does it provide? I.e. typing \ (back slash) then Ctrl-B shows ^B. Is the intension to type \ (back slash), then some control characters?

@saper
Copy link

saper commented Mar 26, 2020

stty lnext is not a solution. I was referring to this:

You can always type [ctrl-V][ctrl-B] to insert a literal [ctrl-B].

[ctrl-v] is not a feature of ksh, it is a feature of a so-called "line discipline" that controls the terminal behaviour in UNIX. It can be turned off or assigned to a different key with stty lnext.

To answer your question:

The way I understand this is the intention - to add control characters that would otherwise do something else, similar to Ctrl-V with default setting of most terminals.

@nickpapadonis
Copy link
Author

I noticed that the behavior works with other characters other then Ctrl-B. For instance backslash then Ctrl-D.

Some questions:
Should Ctrl-B be removed from the possible control characters that can be entered after a backslash.? Potentially only in interactive mode. Thereby giving precedence to the Emacs editing of the buffer? Unsure if the book describes precedence of input characters in their interactive editing mode.

Can we conclude that typing backslash then a control character is a feature based on p106 of the book David Korn authored?

Thanks

@jghub
Copy link
Member

jghub commented Mar 28, 2020

I noticed that the behavior works with other characters other then Ctrl-B. For instance backslash then Ctrl-D.

yes. as already stated in my first reply: "\ Escapes next character". any character that is. those with no special meaning. those special to the shell. those special to the editor. other control characters.

Some questions:
Should Ctrl-B be removed from the possible control characters that can be entered after a backslash.? Potentially only in interactive mode. Thereby giving precedence to the Emacs editing of the buffer? Unsure if the book describes precedence of input characters in their interactive editing mode.

I would say "no" since one cannot predict whether overall users prefer the emacs "meaning" of ^B (users of vi mode obviously don't...) or want consistent means to input all control characters.

and, more important, \ escapes all characters. it would seem totally wrong to exclude one or two of those (^B,^F) from this treatment "just" because they happen to have a meaning to the chosen editor mode.

I understand that the behaviour (can't backspace with ^B over \ immediately after typing it) irritated you but it seems tolerable to now remember it and use the left arrow key for a change if the situation happens again (or switch to vi mode ;)).

Can we conclude that typing backslash then a control character is a feature based on p106 of the book David Korn authored?

I believe the manpage together with the book is the de facto "specification" of the KornLanguage. in this special case, as @marcastel already pointed out in this thread, it is explained not only in the book but also in the manpage in identical words. so it is not a random thing. it is fully intentional and defined behaviour.

in view of all this, my proposal would be to close this issue, right?

Thanks

@nickpapadonis
Copy link
Author

If one types backslash and Ctrl-C, Ctrl-C is not output to the screen with a carrot in front of it. In this case Ctrl-C is captured for it's special meaning. If one types backslash and Ctrl-D, Ctrl-D is output to the screen and it is not captured for it's special meaning. In this sense, I would argue that the behavior is non-uniform for at least Ctrl-C. Does this make sense?

@nickpapadonis nickpapadonis changed the title Ctrl-B and Ctrl-F produces unexpected results in Emacs mode Backslash, then Ctrl-B produces unexpected results in Emacs mode Mar 28, 2020
@nickpapadonis
Copy link
Author

Can someone add the lines from Korn's book p106 that justify this behavior? I have my book in storage. Thanks

@posguy99
Copy link

posguy99 commented Mar 28, 2020

I'm just a ksh user, but here's the page.

Kornshell pg 106

Learning the Korn Shell, 2nd ed dances around the same issue beginning on page 27.

@nickpapadonis
Copy link
Author

Wow there it is. I agree we can close this issue out as expected behavior. Thanks for all the input!

@jghub jghub closed this as completed Mar 28, 2020
@saper
Copy link

saper commented Mar 30, 2020

maybe this difference is also documented and thus a 'feature' (did not check), but probably not. in which case this is simply inconsistent behaviour and it would be desirable to change it in either vi or emacs mode.

From what I can see this is not documented and might be a bug indeed.

asciicast

Looks like a no-editor mode (neither vi not emacs) does behave the same as vi. I prefer emacs behavior. But we should not break the functionality on totally dumb terminals.

@saper
Copy link

saper commented Mar 30, 2020

Also found backspacing mode in vi which makes me stuck forever behind the backslash:

  • Start editing in vi mode
  • After backslash type something then ESC to switch to command mode
  • Use shift-T, backslash command to get behind the backslash
  • Use shift-C to replace the text till the end of line
  • Type something in the input mode
  • Backspace all out - you’ll be stuck “forever” behind the backslash

Note: in the vi’s change command it is not allowed to backspace to the left where one has started

Weird, but consistent with how the change command behaves...

asciicast

@dwierenga
Copy link

oh, you are right regarding also happens in vi mode! it never happened to me since I have added emacs-like cursor movement keybindings to my vi mode so that I can use ^B and ^F from within vi insert mode. and in this combo \^B actually just backspaces :).

Sorry to comment on a closed issue, but is there any chance you could share those keybindings? Maybe start a wiki page in this project? There's a general lack of information around the web for the esoteric way to get KSH to interpret key strokes.

@marcastel
Copy link
Member

@dwierenga Excellent idea. Could I kindly ask you to open an issue here and we will follow-up.

We are working on setting up centralised information and documentation on the KornShell and this definitively belongs there :-)

@nickpapadonis
Copy link
Author

nickpapadonis commented May 4, 2020 via email

@nickpapadonis
Copy link
Author

nickpapadonis commented May 4, 2020 via email

@McDutchie
Copy link

It's worth pointing out here that ksh's "raw" POSIX (neither vi nor emacs) editing mode is implemented in two different ways, depending on whether a UTF-8/multibyte locale is active or not. By default, raw POSIX editing mode is handled in edit/edit.c. But in UTF-8 locales, control is passed to edit/vi.c.

And vi.c contains a number of conditionals that are supposed to disable vi features if the -o vi shell option is not set. Unfortunately that is very buggy, so you get e.g. extremely broken tab completion (which is already fixed in 93u+m).

This also explains the bugs @saper made videos of (see above); evidently, a UTF-8 locale was active. If you set LANG=C or LC_ALL=C you'll see those bugs disappear.

@nickpapadonis
Copy link
Author

nickpapadonis commented Jul 3, 2020 via email

@McDutchie
Copy link

What is shown in this comment is a bug: #8 (comment)

What is shown in this comment is part intended, part bug: #8 (comment)
The ^? is a bug, but it's not a bug that Shift+C puts you in a separate buffer that you can't backslash past until you escape out of it again; that is standard vi behaviour (though vim acts differently).

McDutchie pushed a commit to ksh93/ksh that referenced this issue Jul 3, 2020
This commit fixes the following bugs in the 'vi' editing mode
backslash escape feature. Ref.: Bolsky & Korn (1995), p. 106, or:
ksh-community/ksh#8 (comment)

1. The vi mode now only escapes the next character if the last
   character input was a backslash, fixing the bug demonstrated at:
   https://asciinema.org/a/E3Rq3et07MMQG5BaF7vkXQTg0
2. Escaping backslashes are now disabled in vi.c if the vi mode is
   disabled (note that vi.c handles raw editing mode in UTF-8
   locales). This makes the behavior of the raw editing mode
   consistent in C/POSIX and UTF-8 locales.
3. An odd interaction with Backspace when the character prior to a
   separate buffer entered with Shift-C was a backslash has been
   fixed. Demonstration at: https://asciinema.org/a/314833
   ^? will no longer be output repeatedly when attempting to erase
   a separate buffer with a Backspace, although, to be consistent
   with vi(1), you still cannot backspace past it before escaping
   out of it. Ref.:
   #56 (comment)

src/cmd/ksh93/edit/vi.c:
- Prevent a backslash from escaping the next input if the previous
  input wasn't a backslash. This is done by unsetting a variable
  named backslash if a backslash escaped a character. backslash is
  set to the result of c == '\\' when the user enters a new
  character.
- Disable escaping backslashes in the raw editing mode because
  it should not be enabled there.

src/cmd/ksh93/tests/pty.sh:
- Add some tests for how ksh handles backslashes in each
  editing mode to test for the bugs fixed by this commit.

Fixes #56.
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
Backport the ksh2020 fix for timezone name determination

Partial fix for ksh-community#6.
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
I didn't trust this back in e3d7bf1 (which disabled it for
interactive shells) and I trust it less now. In af6a32d/6b380572,
this was also disabled for virtual subshells as it caused program
flow corruption there. Now, on macOS 10.14.6, a crash occurs when
repeatedly running a command with this optimisation:

$ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
0 1 2 3 4 5 6 7 Illegal instruction

Oddly enough it seems that I can only reproduce this crash on macOS
-- not on Linux, OpenBSD, or Solaris. It could be a macOS bug,
particularly given the odd message in the stack trace below.

I've had enough, though. Out it comes. Things now work fine, the
reproducer is fixed on macOS, and it didn't optimise much anyway.

The double-fork issue discussed in e3d7bf1 remains.
________
For future reference, here's an lldb debugger session with a stack
trace. It crashes on calling calloc() (via sh_calloc(), via
sh_newof()) in jobsave_create(). This is not an invalid pointer
problem as we're allocating new memory, so it does look like an OS
bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting.

$ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
(lldb) target create "arch/darwin.i386-64/bin/ksh"
Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64).
(lldb) settings set -- target.run-args  "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done"
(lldb) run
error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'.
(lldb) process launch
Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64)
0 1 2 3 4 5 6 7 8 9 Process 35038 stopped
* thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
libsystem_platform.dylib`_os_unfair_lock_recursive_abort:
->  0x7fff70deb1c2 <+23>: ud2

libsystem_platform.dylib`_os_unfair_lock_unowned_abort:
    0x7fff70deb1c4 <+0>:  movl   %edi, %eax
    0x7fff70deb1c6 <+2>:  leaq   0x1a8a(%rip), %rcx        ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread"
    0x7fff70deb1cd <+9>:  movq   %rcx, 0x361cb16c(%rip)    ; gCRAnnotations + 8
Target 0: (ksh) stopped.
(lldb) bt
* thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
    frame ksh-community#1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239
    frame ksh-community#2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188
    frame ksh-community#3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66
    frame ksh-community#4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99
    frame ksh-community#5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30
    frame ksh-community#6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13
    frame ksh-community#7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8
    frame ksh-community#8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9
    frame ksh-community#9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3
    frame ksh-community#10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29
    frame ksh-community#11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12
    frame ksh-community#12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17
    frame ksh-community#13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16
    frame ksh-community#14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4
    frame ksh-community#15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9
    frame ksh-community#16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4
    frame ksh-community#17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4
    frame ksh-community#18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2
    frame ksh-community#19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9
    frame ksh-community#20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
The ASan crash in basic.sh when sourcing multiple files is caused by
a bug that is similar to the crash fixed in 59a5672. This is the
trace for the regression test crash (note that in order to see the
trace, the 2>/dev/null redirect must be disabled):

==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100
WRITE of size 8 at 0x6150000005b0 thread T0
    #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967
    ksh-community#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349
    ksh-community#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642
    ksh-community#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613
    ksh-community#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561
    ksh-community#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586
    ksh-community#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438
    ksh-community#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635
    ksh-community#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    ksh-community#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    ksh-community#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932
    ksh-community#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651
    ksh-community#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    ksh-community#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    ksh-community#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604
    ksh-community#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369
    ksh-community#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41
    ksh-community#17 0x7f637b4db2cf  (/usr/lib/libc.so.6+0x232cf)
    ksh-community#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389)
    ksh-community#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115

Code in question:
https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968

To avoid any more similar crashes, all of the fixes introduced
in 7e317c5 that set slp->slptr to null have been improved with the
fix in 59a5672.
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
When ksh executes a script without a #! path (note that the AT&T
team had a real disliking for #! paths), ksh forks and goes through
a quick reinitialisation procedure. This is much faster than
invoking a fully new shell but should have the same effect if it
all works well. Unfortunately it's not worked all that well so far.
Even after recent improvements (see referenced commits) I've been
finding corner case problems.

FYI, running a script without #! basically goes like this:
* in path_spawn(), execve() fails with ENOEXEC because the file is
  not a binary executable and does not start with #!
* this triggers 'case ENOEXEC:' which:
  * forks ksh
  * calls exscript()
* exscript() cleans up & calls siglongjmp(*sh.jmplist,SH_JMPSCRIPT)
* SH_JMPSCRIPT is the highest longjmp value, so *all* the previous
  sigsetjmp/sh_pushcontext calls are unwinded in reverse order,
  triggering all sorts of cleanup, state restoration, removal of
  local scopes, etc.
* eventually, this lands us at the top sigsetjmp in sh_main()
* sh_main() calls sh_reinit(), then resumes as if the shell had
  just been started

This commit makes the following interrelated changes for the
correct functioning of this procedure:
1. exscript() now exports the environment into a dedicated Stk_t
   buffer and sets environ[] to that.
2. Instead of re-using existing variables, sh_reinit() deletes
   everything and reinits all name-value trees from scratch,
   then re-imports the environment from environ[].
3. Variable values that were imported from the environment are no
   longer treated specially with an NV_IMPORT attribute and the
   np->nvenv pointer to their value in environ[], fixing at least
   one crash.[*1]

Details of the changes follow:

src/cmd/ksh93/sh/path.c:
- exscript(): Generate a new environ[] by activating a dedicated
  AST stack that will not be overwritten before calling
  sh_envgen(). This will allow sh_reinit() to delete all variables
  and then reimport the environment. The exporting must be done
  here, before siglongjmp, otherwise locally scoped exported
  variables won't be included (siglongjmp with SH_JMPSCRIPT
  triggers cleanup of all scopes).

src/cmd/ksh93/sh/init.c:
- sh_reinit(): Largely rewritten as follows.
  - Reset shell options first. This has the beneficial side effect
    of unsetting SH_RESTRICTED which interferes with unsetting
    certain variables, like PATH.
  - Remove workarounds for FPATH, SHLVL and tilde expansion
    disciplines; these will not be needed now.
  - Properly unset and delete all functions and built-ins. Since we
    now unset a function before deleting it, this should now free
    up their memory. (See nvdisc.c below for a change allowing
    removal of special built-ins.)
  - Properly unset all variables (which includes any associated
    discipline functions). Incorporate here the needed logic from
    sh_envnolocal() in name.c; most of it is unneeded (that
    function was previously used to cleanup local variables but has
    not been used for that for decades). So sh_envnolocal() is now
    unused.
  - Delete variables in a separate pass after unsetting variables
    and unsetting and deleting functions; this avoids use-after-
    free problems as well as possible "no parent" problems with
    namespace variables (e.g., .rc.status in our new kshrc.sh).
  - After all that, close and free up all function, alias, tracked
    alias, type and variable trees.
  - Free the contiguous built-in node space and the Init_t init
    context (with all the special variable discipline pointers).
  - Call nv_init (previously only called from sh_init) to
    reinitialise all of the above name-value stuff from scratch.
    It's the only way to be sure.
  - Re-import the environment as stored by exscript() above.
- env_init():
  - Per item 3 above and footnote 1 below, no longer set NV_IMPORT
    attribute and no longer point np->nvenv to the item in environ.
  - POSIX says, for 'environ': "Any application that directly
    modifies the pointers to which the environ variable points has
    undefined behavior."[*2] Yet, env_init() is indeed juggling the
    environ[] pointers to deal with variables that cannot be
    imported because their names are invalid (because they still
    need to be saved to be passed on to child processes). Replace
    the current approach with one where those env vars get
    allocated on the heap, pointed to by sh.save_env and counted by
    sh.save_env_n (renamed from sh.nenv). This only needs to be
    done once as ksh cannot use or change these variables.

src/cmd/ksh93/sh/name.c:
- sh_envgen(): Update to match env_init() change above.
- pushnam() (called by sh_envgen()): Remove NV_IMPORT attribute
  check as per above and never get the value from the nvenv pointer
  -- simply always use nv_getval(). As of this change, the
  NV_IMPORT attribute is unused. The next commit will remove it
  and do related cleanups.
- staknam(): is only called if value!=NULL, so remove that 'if'.
- sh_envnolocal(): Removed.

src/cmd/ksh93/sh/nvdisc.c:
- assign(): Remove a check for the SH_INIT state bit that avoids
  freeing functions during sh_reinit(). This works fine now.
- sh_addbuiltin(): Allow sh_reinit() to delete special builtins by
  checking for the SH_INIT state bit before throwing an error.

src/cmd/ksh93/sh/nvtree.c:
- outval(): Add a workaround for a use-after-free, introduced by
  the changes above, that occurred in the types.sh tests for
  #!-less scripts (types.sh:675-722). The use-after-free occurred
  here (abridged ASan trace follows; line numbers are current as of
  this commit):
  ==30849==ERROR: AddressSanitizer: heap-use-after-free [...]
    #0 in dttree dttree.c:393
    ksh-community#1 in sh_reinit init.c:1637
    ksh-community#2 in sh_main main.c:136
    [...]
  The pointer was freed in the same loop via nv_delete() in outval:
    #0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:[...])
    ksh-community#1 in nv_delete name.c:1318
    ksh-community#2 in outval nvtree.c:731
    ksh-community#3 in genvalue nvtree.c:905
    ksh-community#4 in walk_tree nvtree.c:1042
    ksh-community#5 in put_tree nvtree.c:1108
    ksh-community#6 in nv_putv nvdisc.c:144
    ksh-community#7 in _nv_unset name.c:2437
    ksh-community#8 in sh_reinit init.c:1645
    ksh-community#9 in sh_main main.c:136
    [...]
  So, what happened was that the nv_delete() call on name.c:1318
  (eventually resulting from the _nv_unset call on init.c:1645)
  freed the node pointed to by np, so that the next loop iteration
  crashed on line 1637 as the dtnext() macro now gets a freed np.
      Now, why on earth should _nv_unset() *ever* indirectly call
  nv_delete()? That's a question for another day; I suspect it may
  be a bug, or it may be needed for compound variables for some
  reason. For now, I'm adding a workaround: simply avoid calling
  nv_delete() if the SH_INIT state bit is on, indicating
  sh_reinit() is in the call stack. This allows the variables unset
  loop in sh_reinit() to continue without crashing. sh_reinit()
  handles deletion later anyway.

src/cmd/ksh93/sh/main.c:
- sh_main(): remove zeroing of sh.fun_depth and sh.dot_depth; these
  are known to be 0, coming from either sh_init() or sh_reinit().

________
[*1] This NV_IMPORT/nvenv usage is a redundant holdout from ancient
     ksh code; the imported value is easily available as a normal
     shell variable value via nv_getval(). Plus, the nvenv pointer
     is overloaded with too many other purposes: so far I've
     discovered it's used for pointers to subarrays of arrays
     (multidimentional arrays), compound variables, builtins, and
     other things.
     This mess caused at least one crash in set_instance() (xec.c)
     due to incorrectly using that nvenv pointer. The current kshrc
     script triggers this. Reproducer:
        $ export PS1
        $ bin/package use
        «0»26:…/src/ksh93/ksh[dev] $ typeset +x PS1
     ...and crash. That is now fixed.

[*2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
The referenced commit left one test unexecuted because it crashes.

Minimal reproducer:

  typeset -a arr=((a b c) 1)
  got=$( typeset -a arr=( ( ((a b c)1))) )

The crash occurs when the array is redefined in a subshell.

Here are abridged ASan stack traces for the crash, for the use
after free, and for when it was freed:

=================================================================
==73147==ERROR: AddressSanitizer: heap-use-after-free [snippage]
READ of size 8 at 0x000107403eb0 thread T0
    #0 0x104fded40 in nv_search nvdisc.c:1007
    ksh-community#1 0x104fbeb1c in nv_create name.c:860
    ksh-community#2 0x104fb8b9c in nv_open name.c:1440
    ksh-community#3 0x104fb1edc in nv_setlist name.c:309
    ksh-community#4 0x104fb4a30 in nv_setlist name.c:475
    ksh-community#5 0x105055b58 in sh_exec xec.c:1079
    ksh-community#6 0x105045cd4 in sh_subshell subshell.c:654
    ksh-community#7 0x104f92c1c in comsubst macro.c:2266
[snippage]

0x000107403eb0 is located 0 bytes inside of 80-byte region [snippage]
freed by thread T0 here:
    #0 0x105c5ade4 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ede4)
    ksh-community#1 0x105261da0 in dtclose dtclose.c:52
    ksh-community#2 0x104f178cc in array_putval array.c:671
    ksh-community#3 0x104fd7f4c in nv_putv nvdisc.c:144
    ksh-community#4 0x104fbc5f0 in _nv_unset name.c:2435
    ksh-community#5 0x104fb3250 in nv_setlist name.c:364
    ksh-community#6 0x105055b58 in sh_exec xec.c:1079
    ksh-community#7 0x105045cd4 in sh_subshell subshell.c:654
    ksh-community#8 0x104f92c1c in comsubst macro.c:2266
[snippage]

So the crash is caused because array_putval (array.c:671) calls
dtclose, freeing ap->table, which is then reused after a recursive
nv_setlist call via nv_open() -> nv_create() -> nv_search().
This only happens whwn we're in a virtual subshell.

src/cmd/ksh93/sh/array.c:
- array_putval(): When redefining an array in a virtual subshell,
  do not free the old ap->table; it will be needed by the parent
  shell environment.
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
I didn't trust this back in e3d7bf1 (which disabled it for
interactive shells) and I trust it less now. In af6a32d/6b380572,
this was also disabled for virtual subshells as it caused program
flow corruption there. Now, on macOS 10.14.6, a crash occurs when
repeatedly running a command with this optimisation:

$ ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
0 1 2 3 4 5 6 7 Illegal instruction

Oddly enough it seems that I can only reproduce this crash on macOS
-- not on Linux, OpenBSD, or Solaris. It could be a macOS bug,
particularly given the odd message in the stack trace below.

I've had enough, though. Out it comes. Things now work fine, the
reproducer is fixed on macOS, and it didn't optimise much anyway.

The double-fork issue discussed in e3d7bf1 remains.
________
For future reference, here's an lldb debugger session with a stack
trace. It crashes on calling calloc() (via sh_calloc(), via
sh_newof()) in jobsave_create(). This is not an invalid pointer
problem as we're allocating new memory, so it does look like an OS
bug. The "BUG IN CLIENT OF LIBPLATFORM" message is interesting.

$ lldb -- arch/*/bin/ksh -c 'for((i=0;i<100;i++));do print -n "$i ";(sleep 1&);done'
(lldb) target create "arch/darwin.i386-64/bin/ksh"
Current executable set to 'arch/darwin.i386-64/bin/ksh' (x86_64).
(lldb) settings set -- target.run-args  "-c" "for((i=0;i<100;i++));do print -n \"$i \";(sleep 1&);done"
(lldb) run
error: shell expansion failed (reason: lldb-argdumper exited with error 2). consider launching with 'process launch'.
(lldb) process launch
Process 35038 launched: '/usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh' (x86_64)
0 1 2 3 4 5 6 7 8 9 Process 35038 stopped
* thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
libsystem_platform.dylib`_os_unfair_lock_recursive_abort:
->  0x7fff70deb1c2 <+23>: ud2

libsystem_platform.dylib`_os_unfair_lock_unowned_abort:
    0x7fff70deb1c4 <+0>:  movl   %edi, %eax
    0x7fff70deb1c6 <+2>:  leaq   0x1a8a(%rip), %rcx        ; "BUG IN CLIENT OF LIBPLATFORM: Unlock of an os_unfair_lock not owned by current thread"
    0x7fff70deb1cd <+9>:  movq   %rcx, 0x361cb16c(%rip)    ; gCRAnnotations + 8
Target 0: (ksh) stopped.
(lldb) bt
* thread ksh-community#1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x00007fff70deb1c2 libsystem_platform.dylib`_os_unfair_lock_recursive_abort + 23
    frame ksh-community#1: 0x00007fff70de7c9a libsystem_platform.dylib`_os_unfair_lock_lock_slow + 239
    frame ksh-community#2: 0x00007fff70daa3bd libsystem_malloc.dylib`tiny_malloc_should_clear + 188
    frame ksh-community#3: 0x00007fff70daa20f libsystem_malloc.dylib`szone_malloc_should_clear + 66
    frame ksh-community#4: 0x00007fff70dab444 libsystem_malloc.dylib`malloc_zone_calloc + 99
    frame ksh-community#5: 0x00007fff70dab3c4 libsystem_malloc.dylib`calloc + 30
    frame ksh-community#6: 0x000000010003fa5d ksh`sh_calloc(nmemb=1, size=16) at init.c:264:13
    frame ksh-community#7: 0x000000010004f8a6 ksh`jobsave_create(pid=35055) at jobs.c:272:8
    frame ksh-community#8: 0x000000010004ed42 ksh`job_reap(sig=20) at jobs.c:363:9
    frame ksh-community#9: 0x000000010004ff6f ksh`job_waitsafe(sig=20) at jobs.c:511:3
    frame ksh-community#10: 0x00007fff70de9b5d libsystem_platform.dylib`_sigtramp + 29
    frame ksh-community#11: 0x00007fff70d39ac4 libsystem_kernel.dylib`__fork + 12
    frame ksh-community#12: 0x00007fff70c57d80 libsystem_c.dylib`fork + 17
    frame ksh-community#13: 0x000000010009590d ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:1883:16
    frame ksh-community#14: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005d30, flags=4) at xec.c:2019:4
    frame ksh-community#15: 0x0000000100096c4f ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2213:9
    frame ksh-community#16: 0x0000000100096013 ksh`sh_exec(t=0x0000000101005a40, flags=5) at xec.c:2019:4
    frame ksh-community#17: 0x000000010001c23f ksh`exfile(iop=0x0000000100405750, fno=-1) at main.c:603:4
    frame ksh-community#18: 0x000000010001b23c ksh`sh_main(ac=3, av=0x00007ffeefbff4f0, userinit=0x0000000000000000) at main.c:365:2
    frame ksh-community#19: 0x0000000100000776 ksh`main(argc=3, argv=0x00007ffeefbff4f0) at pmain.c:45:9
    frame ksh-community#20: 0x00007fff70bfe3d5 libdyld.dylib`start + 1
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
The ASan crash in basic.sh when sourcing multiple files is caused by
a bug that is similar to the crash fixed in f24040e. This is the
trace for the regression test crash (note that in order to see the
trace, the 2>/dev/null redirect must be disabled):

==1899388==ERROR: AddressSanitizer: heap-use-after-free on address 0x6150000005b0 at pc 0x55a5e3f9432a bp 0x7ffeb91ea110 sp 0x7ffeb91ea100
WRITE of size 8 at 0x6150000005b0 thread T0
    #0 0x55a5e3f94329 in funct /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:967
    ksh-community#1 0x55a5e3f96f77 in item /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:1349
    ksh-community#2 0x55a5e3f90c9f in term /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:642
    ksh-community#3 0x55a5e3f90ac1 in list /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:613
    ksh-community#4 0x55a5e3f90845 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:561
    ksh-community#5 0x55a5e3f909e0 in sh_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:586
    ksh-community#6 0x55a5e3f8fd5e in sh_parse /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/parse.c:438
    ksh-community#7 0x55a5e3fc43c1 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:635
    ksh-community#8 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    ksh-community#9 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    ksh-community#10 0x55a5e3fd01d4 in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1932
    ksh-community#11 0x55a5e3fc4544 in sh_eval /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:651
    ksh-community#12 0x55a5e4012172 in b_dot_cmd /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/bltins/misc.c:318
    ksh-community#13 0x55a5e3fca3cb in sh_exec /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/xec.c:1254
    ksh-community#14 0x55a5e3ecc1cd in exfile /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:604
    ksh-community#15 0x55a5e3ec9e7f in sh_main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/main.c:369
    ksh-community#16 0x55a5e3ec801d in main /home/johno/GitRepos/KornShell/ksh/src/cmd/ksh93/sh/pmain.c:41
    ksh-community#17 0x7f637b4db2cf  (/usr/lib/libc.so.6+0x232cf)
    ksh-community#18 0x7f637b4db389 in __libc_start_main (/usr/lib/libc.so.6+0x23389)
    ksh-community#19 0x55a5e3ec7f24 in _start ../sysdeps/x86_64/start.S:115

Code in question:
https://github.com/ksh93/ksh/blob/8d57369b0cb39074437dd82924b604155e30e1e0/src/cmd/ksh93/sh/parse.c#L963-L968

To avoid any more similar crashes, all of the fixes introduced
in 69d37d5 that set slp->slptr to null have been improved with the
fix in f24040e.
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
When ksh executes a script without a #! path (note that the AT&T
team had a real disliking for #! paths), ksh forks and goes through
a quick reinitialisation procedure. This is much faster than
invoking a fully new shell but should have the same effect if it
all works well. Unfortunately it's not worked all that well so far.
Even after recent improvements (see referenced commits) I've been
finding corner case problems.

FYI, running a script without #! basically goes like this:
* in path_spawn(), execve() fails with ENOEXEC because the file is
  not a binary executable and does not start with #!
* this triggers 'case ENOEXEC:' which:
  * forks ksh
  * calls exscript()
* exscript() cleans up & calls siglongjmp(*sh.jmplist,SH_JMPSCRIPT)
* SH_JMPSCRIPT is the highest longjmp value, so *all* the previous
  sigsetjmp/sh_pushcontext calls are unwinded in reverse order,
  triggering all sorts of cleanup, state restoration, removal of
  local scopes, etc.
* eventually, this lands us at the top sigsetjmp in sh_main()
* sh_main() calls sh_reinit(), then resumes as if the shell had
  just been started

This commit makes the following interrelated changes for the
correct functioning of this procedure:
1. exscript() now exports the environment into a dedicated Stk_t
   buffer and sets environ[] to that.
2. Instead of re-using existing variables, sh_reinit() deletes
   everything and reinits all name-value trees from scratch,
   then re-imports the environment from environ[].
3. Variable values that were imported from the environment are no
   longer treated specially with an NV_IMPORT attribute and the
   np->nvenv pointer to their value in environ[], fixing at least
   one crash.[*1]

Details of the changes follow:

src/cmd/ksh93/sh/path.c:
- exscript(): Generate a new environ[] by activating a dedicated
  AST stack that will not be overwritten before calling
  sh_envgen(). This will allow sh_reinit() to delete all variables
  and then reimport the environment. The exporting must be done
  here, before siglongjmp, otherwise locally scoped exported
  variables won't be included (siglongjmp with SH_JMPSCRIPT
  triggers cleanup of all scopes).

src/cmd/ksh93/sh/init.c:
- sh_reinit(): Largely rewritten as follows.
  - Reset shell options first. This has the beneficial side effect
    of unsetting SH_RESTRICTED which interferes with unsetting
    certain variables, like PATH.
  - Remove workarounds for FPATH, SHLVL and tilde expansion
    disciplines; these will not be needed now.
  - Properly unset and delete all functions and built-ins. Since we
    now unset a function before deleting it, this should now free
    up their memory. (See nvdisc.c below for a change allowing
    removal of special built-ins.)
  - Properly unset all variables (which includes any associated
    discipline functions). Incorporate here the needed logic from
    sh_envnolocal() in name.c; most of it is unneeded (that
    function was previously used to cleanup local variables but has
    not been used for that for decades). So sh_envnolocal() is now
    unused.
  - Delete variables in a separate pass after unsetting variables
    and unsetting and deleting functions; this avoids use-after-
    free problems as well as possible "no parent" problems with
    namespace variables (e.g., .rc.status in our new kshrc.sh).
  - After all that, close and free up all function, alias, tracked
    alias, type and variable trees.
  - Free the contiguous built-in node space and the Init_t init
    context (with all the special variable discipline pointers).
  - Call nv_init (previously only called from sh_init) to
    reinitialise all of the above name-value stuff from scratch.
    It's the only way to be sure.
  - Re-import the environment as stored by exscript() above.
- env_init():
  - Per item 3 above and footnote 1 below, no longer set NV_IMPORT
    attribute and no longer point np->nvenv to the item in environ.
  - POSIX says, for 'environ': "Any application that directly
    modifies the pointers to which the environ variable points has
    undefined behavior."[*2] Yet, env_init() is indeed juggling the
    environ[] pointers to deal with variables that cannot be
    imported because their names are invalid (because they still
    need to be saved to be passed on to child processes). Replace
    the current approach with one where those env vars get
    allocated on the heap, pointed to by sh.save_env and counted by
    sh.save_env_n (renamed from sh.nenv). This only needs to be
    done once as ksh cannot use or change these variables.

src/cmd/ksh93/sh/name.c:
- sh_envgen(): Update to match env_init() change above.
- pushnam() (called by sh_envgen()): Remove NV_IMPORT attribute
  check as per above and never get the value from the nvenv pointer
  -- simply always use nv_getval(). As of this change, the
  NV_IMPORT attribute is unused. The next commit will remove it
  and do related cleanups.
- staknam(): is only called if value!=NULL, so remove that 'if'.
- sh_envnolocal(): Removed.

src/cmd/ksh93/sh/nvdisc.c:
- assign(): Remove a check for the SH_INIT state bit that avoids
  freeing functions during sh_reinit(). This works fine now.
- sh_addbuiltin(): Allow sh_reinit() to delete special builtins by
  checking for the SH_INIT state bit before throwing an error.

src/cmd/ksh93/sh/nvtree.c:
- outval(): Add a workaround for a use-after-free, introduced by
  the changes above, that occurred in the types.sh tests for
  #!-less scripts (types.sh:675-722). The use-after-free occurred
  here (abridged ASan trace follows; line numbers are current as of
  this commit):
  ==30849==ERROR: AddressSanitizer: heap-use-after-free [...]
    #0 in dttree dttree.c:393
    ksh-community#1 in sh_reinit init.c:1637
    ksh-community#2 in sh_main main.c:136
    [...]
  The pointer was freed in the same loop via nv_delete() in outval:
    #0 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:[...])
    ksh-community#1 in nv_delete name.c:1318
    ksh-community#2 in outval nvtree.c:731
    ksh-community#3 in genvalue nvtree.c:905
    ksh-community#4 in walk_tree nvtree.c:1042
    ksh-community#5 in put_tree nvtree.c:1108
    ksh-community#6 in nv_putv nvdisc.c:144
    ksh-community#7 in _nv_unset name.c:2437
    ksh-community#8 in sh_reinit init.c:1645
    ksh-community#9 in sh_main main.c:136
    [...]
  So, what happened was that the nv_delete() call on name.c:1318
  (eventually resulting from the _nv_unset call on init.c:1645)
  freed the node pointed to by np, so that the next loop iteration
  crashed on line 1637 as the dtnext() macro now gets a freed np.
      Now, why on earth should _nv_unset() *ever* indirectly call
  nv_delete()? That's a question for another day; I suspect it may
  be a bug, or it may be needed for compound variables for some
  reason. For now, I'm adding a workaround: simply avoid calling
  nv_delete() if the SH_INIT state bit is on, indicating
  sh_reinit() is in the call stack. This allows the variables unset
  loop in sh_reinit() to continue without crashing. sh_reinit()
  handles deletion later anyway.

src/cmd/ksh93/sh/main.c:
- sh_main(): remove zeroing of sh.fun_depth and sh.dot_depth; these
  are known to be 0, coming from either sh_init() or sh_reinit().

________
[*1] This NV_IMPORT/nvenv usage is a redundant holdout from ancient
     ksh code; the imported value is easily available as a normal
     shell variable value via nv_getval(). Plus, the nvenv pointer
     is overloaded with too many other purposes: so far I've
     discovered it's used for pointers to subarrays of arrays
     (multidimentional arrays), compound variables, builtins, and
     other things.
     This mess caused at least one crash in set_instance() (xec.c)
     due to incorrectly using that nvenv pointer. The current kshrc
     script triggers this. Reproducer:
        $ export PS1
        $ bin/package use
        «0»26:…/src/ksh93/ksh[dev] $ typeset +x PS1
     ...and crash. That is now fixed.

[*2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/environ.html
aweeraman pushed a commit to aweeraman/ksh that referenced this issue Feb 8, 2025
The referenced commit left one test unexecuted because it crashes.

Minimal reproducer:

  typeset -a arr=((a b c) 1)
  got=$( typeset -a arr=( ( ((a b c)1))) )

The crash occurs when the array is redefined in a subshell.

Here are abridged ASan stack traces for the crash, for the use
after free, and for when it was freed:

=================================================================
==73147==ERROR: AddressSanitizer: heap-use-after-free [snippage]
READ of size 8 at 0x000107403eb0 thread T0
    #0 0x104fded40 in nv_search nvdisc.c:1007
    ksh-community#1 0x104fbeb1c in nv_create name.c:860
    ksh-community#2 0x104fb8b9c in nv_open name.c:1440
    ksh-community#3 0x104fb1edc in nv_setlist name.c:309
    ksh-community#4 0x104fb4a30 in nv_setlist name.c:475
    ksh-community#5 0x105055b58 in sh_exec xec.c:1079
    ksh-community#6 0x105045cd4 in sh_subshell subshell.c:654
    ksh-community#7 0x104f92c1c in comsubst macro.c:2266
[snippage]

0x000107403eb0 is located 0 bytes inside of 80-byte region [snippage]
freed by thread T0 here:
    #0 0x105c5ade4 in wrap_free+0x98 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x3ede4)
    ksh-community#1 0x105261da0 in dtclose dtclose.c:52
    ksh-community#2 0x104f178cc in array_putval array.c:671
    ksh-community#3 0x104fd7f4c in nv_putv nvdisc.c:144
    ksh-community#4 0x104fbc5f0 in _nv_unset name.c:2435
    ksh-community#5 0x104fb3250 in nv_setlist name.c:364
    ksh-community#6 0x105055b58 in sh_exec xec.c:1079
    ksh-community#7 0x105045cd4 in sh_subshell subshell.c:654
    ksh-community#8 0x104f92c1c in comsubst macro.c:2266
[snippage]

So the crash is caused because array_putval (array.c:671) calls
dtclose, freeing ap->table, which is then reused after a recursive
nv_setlist call via nv_open() -> nv_create() -> nv_search().
This only happens whwn we're in a virtual subshell.

src/cmd/ksh93/sh/array.c:
- array_putval(): When redefining an array in a virtual subshell,
  do not free the old ap->table; it will be needed by the parent
  shell environment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants