Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal resize causes wait to exit #1067

Closed
lukaswrz opened this issue Jan 4, 2022 · 13 comments
Closed

Terminal resize causes wait to exit #1067

lukaswrz opened this issue Jan 4, 2022 · 13 comments

Comments

@lukaswrz
Copy link

lukaswrz commented Jan 4, 2022

It looks like wait doesn't behave correctly when the terminal window it runs in gets resized.

Example in oil:

oil$ set -x
oil$ fork { sleep 10 }
. builtin fork
| fork 26004
oil$   . 26004 exec sleep 10

oil$ wait
> wait
< wait
  wait
  ^~~~
[ interactive ]:4: errexit PID 25816: Command failed with status 128

Setting a signal handler for SIGWINCH before running the above code shows that it attempts to run the signal handler (without any output?) and immediately kills the child process:

oil$ trap 'echo SIGWINCH' SIGWINCH
oil$ set -x
oil$ fork { sleep 10 }
. builtin fork
| fork 27362
oil$   . 27362 exec sleep 10

oil$ wait
> wait
  ! Received signal 28.  Will run handler in main loop
< wait
  wait
  ^~~~
[ interactive ]:5: errexit PID 27242: Command failed with status 156

The same happens with osh regardless of whether errexit is set:

osh-0.9.6$ trap 'echo SIGWINCH' SIGWINCH
osh-0.9.6$ set -e
osh-0.9.6$ set -x
osh-0.9.6$ sleep 10 &
osh-0.9.6$ + 34506 sleep 10

osh-0.9.6$ wait
+ wait
osh-0.9.6$

In the case of osh, it doesn't seem to output any information outside of what set -x provides.

I assume this is not intended behavior since it's not consistent with bash.
I've tested this on the most recent release (0.9.6) and a build straight from the master branch and the behavior is the same.

@andychu
Copy link
Contributor

andychu commented Jan 4, 2022

Ah I can reproduce this, thank you ...

There was another bug related to SIGWINCH, but I don't remember offhand which one.

@andychu
Copy link
Contributor

andychu commented Jan 5, 2022

OK there is actually an architecture issue here -- Oil registers a handler for SIGWINCH and keeps it on all the time. The interactive shell needs to know when the terminal width has changed.

In contrast, bash relies on GNU readline to listen for SIGWINCH. And it ENABLES and disables the handler when entering and exiting readline(). So none of the other shell code has to worry about that.

We should probably do the same thing.

And this is basically the same bug as #292 with wait rather than read.

Thanks again for the report!

@andychu
Copy link
Contributor

andychu commented Jan 5, 2022

Although I also wonder if bash is incorrect because of this? What happens if you do

  • wait and then resize terminal while it's blocking. SIGWINCH is ignored
  • hit TAB to get completions. Is the width messed up? Or maybe it recalculates the width every time it enters GNU readline? In Oil we do that with libc.get_terminal_width() and an is_dirty flag

Hm no it seems alright. So I guess it must recompute it every time it prints the prompt

andychu pushed a commit that referenced this issue Jan 22, 2022
The line editor is the only thing that needs to know the width, so I
think we should let it handle the problem.

Unfortunately GNU readline is not what we want, but that's a separate
issue.

This means the core/comp_ui.py now doesn't adjust when the window is
resized.  But that dosn't make the shell unusuable.  We should revisit
it later.

Apparently fish and zsh do something different anyway -- they REDRAW on
SIGWINCH.  They don't merely update the width.

- Added a test in test/interactive.py.
- Addresses #1067
@andychu
Copy link
Contributor

andychu commented Jan 22, 2022

OK I just made OSH ignore SIGWINCH, which is what most shells do. I still want to figure out what to do with core/comp_ui.py -- is there a way to salvage the nice behavior, or should we revert to bash-like UI? But that will come a little later.

Thanks for the report!

lukaswrz added a commit to lukaswrz/oil that referenced this issue Jan 22, 2022
Related to oils-for-unix#1067.  This commit fixes a problem with the width not being
updated in the completion UI as a side effect of ignoring SIGWINCH.
@lukaswrz
Copy link
Author

That seems to solve the problem, thank you.

I'm still hesitant to close this since, as you mentioned, 3ab4e69 causes the completion UI to break.

So, I experimented a little and tried to fix the completion issue: 3e4c8dd

That seems to be working fine for me.

It just registers a signal handler when it asks for user input and then unregisters it when it tries to execute a command. This might be a little too much brute-force since (AFAIK) it's basically unnecessary to unregister the signal handler if the command is not a builtin.

Not sure if that would break anything else.

@andychu
Copy link
Contributor

andychu commented Jan 22, 2022

Hmm yeah this is not bad... I can't see an obvious reason this would break. I was worried about it not being exception safe but I think all the exceptions are caught.

Ahhh one problem might be shell completion plugins run when you hit TAB. That happens within the area where SIGWINCH is enabled.

In that case the wait but may reappear. I think probably almost no completion plugins use wait, but it does feel like that will be a very difficult bug if they do. Having the signal handling state be different in completion plugins vs. regular code seems possibly scary.


I realize that punting on the whole issue seems lame, but the core issue is that the current approach is not that great because of #257 either. That is, I'd rather have a line editing experience like zsh or fish in Oil, but that appears impossible given that we use GNU readline. I think the author of ble.sh concurs with this... (this is a line editor in pure bash!)

And as mentioned, zsh and fish appear to actually redraw the screen on SIGWINCH! They have to keep track of a lot of state that's not exposed by GNU readline.


I think what I'm leaning toward is making the default experience like bash, but then having an option to try to keep the prompt stable like zsh/fish?

Another possibility is to ask GNU readline to expose LINES and COLUMNS to us. Maybe that is what bash does? I will look.

@andychu
Copy link
Contributor

andychu commented Jan 22, 2022

Another related issue is #460 to build with libedit and such. That will restrict the soluion space even more though ...

And #459 for LINES and COLUMNS

andychu pushed a commit that referenced this issue Jan 24, 2022
OSH now queries this state, instead of trying to keep track of it
ourselves.

So OSH behaves like bash and other shells here:

- If trap 'echo X' SIGWINCH, then SIGWINCH will interrupt the wait
  builtin.
- Otherwise it is ignored.

We take care to only enable the SIGWINCH handler inside call_readline(),
when shell code isn't running.

When the TAB completion callback is invoked, we do the reverse: we
disable our SIGWINCH tracking.

So we can lose some resizes, but this is what other shells do.  TODO:
Test this out a bit more.

Addresses #1067.
@andychu
Copy link
Contributor

andychu commented Jan 24, 2022

OK I mostly fixed this:

  • wait doesn't abort on SIGWINCH
  • AND we update the terminal size on SIGWINCH

BUT:

  • The policy of disabling SIGWINCH in shell code (which I thought bash did), means that you can type sleep 10, resize the window, and it won't get reflected. Somehow bash doesn't suffer from this .... thinking about it.

Related: this area is very hairy in bash too ... So I feel better about butting my head against it :-/

bash-4.4$ grep -C1 -i SIGWINCH CHANGES 
h.  Use pselect(2), if available, to wait for input before calling read(2), so
    a SIGWINCH can interrupt it, since it doesn't interrupt read(2).

--
f.  Readline now calls the signal hook after resizing the terminal when it
    receives a SIGWINCH.

--
c.  Fixed a bug that caused readline to try and run code to modify its idea
    of the screen size in a signal handler context upon receiving a SIGWINCH.

--

a.  The SIGWINCH signal handler now avoids calling the redisplay code if
    one arrives while in the middle of redisplay.
--
m.  Fixed a bug that caused readline to reference uninitialized data structures
    if it received a SIGWINCH before completing initialzation.

--
g.  Fixed a synchronization problem that could cause core dumps when handling
    a SIGWINCH.

--
c.  Fixed the redisplay code to avoid core dumps resulting from a poorly-timed
    SIGWINCH.

--

s.  Make sure system calls are restarted after a SIGWINCH is received using
    SA_RESTART.
--

d.  Fixes to the SIGWINCH code so that a multiple-line prompt with escape
    sequences is redrawn correctly.
--
          SIGTTIN, and SIGTTOU;
        o A new variable, rl_catch_sigwinch, is available to application
          writers to indicate to readline whether or not it should install its
          own signal handler for SIGWINCH, which will chain to the calling
          applications's SIGWINCH handler, if one is installed;
        o There is a new function, rl_free_line_state, for application signal
--
b.  There is a new function, rl_resize_terminal, to reset readline's idea of
    the screen size after a SIGWINCH.

--

c.  Changes were made to the SIGWINCH handling code so that prompt redisplay
    is done better.

@andychu
Copy link
Contributor

andychu commented Jan 24, 2022

Also I found #1080 , which is related

Related question: why are read and wait different ?

  • read handles EINTR and retries
  • wait gets interrupted on EINTR

I guess this is because read can run traps and continue to do its job ... it sorta makes sense. wait is only doing one thing

@andychu
Copy link
Contributor

andychu commented Jan 24, 2022

Doh this might have actually been a bug with wait itself ... not the SIGWINCH handling! The return value of WaitForOne distinguishes between 3 cases, and we are returning if it's ECHILD or signal. But we shouldn't return if it's a signal.

Oh well this actually motivated some cleanup and simplification

@andychu
Copy link
Contributor

andychu commented Jan 25, 2022

Original bugs: #858 and the related bug for read #986

So we have to fix this without regressing those ...

@andychu
Copy link
Contributor

andychu commented Jan 27, 2022

Hm looks like the last commit fixed this in a better way (nevermind, this was a bug, but the rest of this comment is true, and the tests caught it)

985de3c

It's now tested in test/interactive


BTW this issue caused me to pull on a VERY LONG THREAD of fixing interactive shell and signal behavior, which is still ongoing, but it is going to be good. If interested there is a ton on Zulip:

https://oilshell.zulipchat.com/#narrow/stream/121539-oil-dev/topic/test.2Finteractive.20is.20a.20useful.20new.20methodology

Thanks for the nudge in the right direction :)

andychu pushed a commit that referenced this issue Jan 27, 2022
Hm there was spurious fix for #1067 here ... what's the right fix?
andychu pushed a commit that referenced this issue Jan 28, 2022
This is the same bug as #1067, but with 'wait -n' instead of 'wait'.

I changed it to loop until one fewer job is running.  This also had the
side effect of fixing another test case, so there's only one failure
now!

- Clean up 'wait -n' spec tests.
@andychu andychu reopened this Jan 28, 2022
@andychu
Copy link
Contributor

andychu commented Jan 30, 2022

Done in 0.9.7: https://www.oilshell.org/blog/2022/01/notes-themes.html

Thanks again for the report, this ended up being very productive, as I described in the blog post!

Let me know if you see any other issues

@andychu andychu closed this as completed Jan 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants