Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whonix DispVM Fails with "Got empty response from qubesd" "KeyError" unhandled exception while calling src=b'dom0' meth=b'admin.vm.List' #5121

Closed
maltfield opened this issue Jun 26, 2019 · 23 comments
Labels
C: Whonix This issue impacts Qubes-Whonix eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.

Comments

@maltfield
Copy link

Qubes OS version:

Qubes release 4.0 (R4.0)

Affected component(s) or functionality:

whonix-ws-14-dvm DispVm


Steps to reproduce the behavior:

  1. Attempt to start a new disposable vm based on the 'whonix-ws-14-dvm' template -- either in the gui or by executing the below command in dom0:
[user@dom0 ~]$ qvm-run --dispvm whonix-ws-14-dvm xterm
  1. Wait for the desktop notification to indicate to me that the VM is "Domain dispXXXX is starting"
  2. Wait for the desktop notification to indicate that "Domain dixpXXXX has started"
  3. Wait for the desktop notification to indicate that the "Domain dispXXXX has halted".

The dom0 terminal where I launched the VM eventually returns this (after some delay following the "halted" notify-send:

[user@dom0 ~]$ disp8109: command failed with code: 1
Traceback (most recent call last):
  File "/usr/bin/qvm-run", line 5, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 290, in main
    dispvm.cleanup()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 389, in cleanup
    self.kill()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 125, in kill
    self.qubesd_call(self._method_dest, 'admin.vm.Kill')
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 68, in qubesd_call
    payload_stream)
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 557, in qubesd_call
    return self._parse_qubesd_response(return_data)
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 81, in _parse_qubesd_response
    'Got empty response from qubesd. See journalctl in dom0 for '
qubesadmin.exc.QubesDaemonNoResponseError: Got empty response from qubesd. See journalctl in dom0 for details.

[1]+  Exit 1                  qvm-run --dispvm whonix-ws-14-dvm torbrowser

Per the msg above, I left journalctl -f running. In-between the "started" notify-send and the "halted" notify-send, it outputted the following:

Jun 26 17:00:40 dom0 qmemman.daemon.algo[2001]: balance_when_enough_memory(xen_free_memory=53548, total_mem_pref=5594737459.200001, total_available_memory=1549900882.7999997)
Jun 26 17:00:43 dom0 qmemman.daemon.algo[2001]: balance_when_enough_memory(xen_free_memory=53548, total_mem_pref=5594737459.200001, total_available_memory=1549900882.7999997)
Jun 26 17:00:44 dom0 qmemman.daemon.algo[2001]: balance_when_enough_memory(xen_free_memory=451946218, total_mem_pref=5594737459.200001, total_available_memory=2001793552.7999997)
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: stat: dom '0' act=2349760537 pref=1836403097.6000001 last_target=2349760537
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: stat: dom '10' act=508186142 pref=398332313.6 last_target=508186142
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: stat: dom '7' act=518817651 pref=406665625.6 last_target=518817651
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: stat: dom '3' act=464002540 pref=363699814.40000004 last_target=464002540
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: stat: dom '6' act=3303817924 pref=2589636608.0 last_target=3303817924
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: stat: xenfree=504375018 memset_reqs=[('3', 493337548), ('10', 540314510), ('7', 551618160), ('0', 2490973506), ('6', 3512690754)]
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: mem-set domain 3 to 493337548
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: mem-set domain 10 to 540314510
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: mem-set domain 7 to 551618160
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: mem-set domain 0 to 2490973506
Jun 26 17:00:44 dom0 qmemman.systemstate[2001]: mem-set domain 6 to 3512690754
Jun 26 17:00:46 dom0 runuser[17790]: pam_unix(runuser:session): session opened for user user by (uid=0)
Jun 26 17:00:46 dom0 kernel: audit: type=1103 audit(1561547746.569:641): pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
Jun 26 17:00:46 dom0 kernel: audit: type=1105 audit(1561547746.570:642): pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
Jun 26 17:00:46 dom0 audit[17790]: CRED_ACQ pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
Jun 26 17:00:46 dom0 audit[17790]: USER_START pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_open grantors=pam_keyinit,pam_limits,pam_unix acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
Jun 26 17:00:46 dom0 qubesd[1987]: unhandled exception while calling src=b'dom0' meth=b'admin.vm.property.Get' dest=b'disp9327' arg=b'qid' len(untrusted_payload)=0
Jun 26 17:00:46 dom0 qubesd[1987]: Traceback (most recent call last):
Jun 26 17:00:46 dom0 qubesd[1987]:   File "/usr/lib/python3.5/site-packages/qubes/api/__init__.py", line 264, in respond
Jun 26 17:00:46 dom0 qubesd[1987]:     self.send_event)
Jun 26 17:00:46 dom0 qubesd[1987]:   File "/usr/lib/python3.5/site-packages/qubes/api/__init__.py", line 125, in __init__
Jun 26 17:00:46 dom0 qubesd[1987]:     self.dest = self.app.domains[dest.decode('ascii')]
Jun 26 17:00:46 dom0 qubesd[1987]:   File "/usr/lib/python3.5/site-packages/qubes/app.py", line 467, in __getitem__
Jun 26 17:00:46 dom0 qubesd[1987]:     raise KeyError(key)
Jun 26 17:00:46 dom0 qubesd[1987]: KeyError: 'disp9327'
Jun 26 17:00:46 dom0 qubesd[1987]: unhandled exception while calling src=b'dom0' meth=b'admin.vm.List' dest=b'disp9327' arg=b'' len(untrusted_payload)=0
Jun 26 17:00:46 dom0 qubesd[1987]: Traceback (most recent call last):
Jun 26 17:00:46 dom0 qubesd[1987]:   File "/usr/lib/python3.5/site-packages/qubes/api/__init__.py", line 264, in respond
Jun 26 17:00:46 dom0 qubesd[1987]:     self.send_event)
Jun 26 17:00:46 dom0 qubesd[1987]:   File "/usr/lib/python3.5/site-packages/qubes/api/__init__.py", line 125, in __init__
Jun 26 17:00:46 dom0 qubesd[1987]:     self.dest = self.app.domains[dest.decode('ascii')]
Jun 26 17:00:46 dom0 qubesd[1987]:   File "/usr/lib/python3.5/site-packages/qubes/app.py", line 467, in __getitem__
Jun 26 17:00:46 dom0 qubesd[1987]:     raise KeyError(key)
Jun 26 17:00:46 dom0 qubesd[1987]: KeyError: 'disp9327'
Jun 26 17:00:47 dom0 runuser[17790]: pam_unix(runuser:session): session closed for user user
Jun 26 17:00:47 dom0 qubesd[1987]: Removing volume root: qubes_dom0/vm-disp9327-root
Jun 26 17:00:47 dom0 qubesd[1987]: Removing volume private: qubes_dom0/vm-disp9327-private
Jun 26 17:00:47 dom0 qubesd[1987]: Removing volume kernel: 4.14.119-2
Jun 26 17:00:47 dom0 qubesd[1987]: Removing volume volatile: qubes_dom0/vm-disp9327-volatile
Jun 26 17:00:47 dom0 audit[17790]: USER_END pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_unix acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
Jun 26 17:00:47 dom0 audit[17790]: CRED_DISP pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
Jun 26 17:00:47 dom0 kernel: audit: type=1106 audit(1561547747.682:643): pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_close grantors=pam_keyinit,pam_limits,pam_unix acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'
Jun 26 17:00:47 dom0 kernel: audit: type=1104 audit(1561547747.682:644): pid=17790 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_rootok acct="user" exe="/usr/sbin/runuser" hostname=? addr=? terminal=? res=success'

As for the guest console log, sometimes it finishes as expected with the "host login:" line, and sometimes it ends prematurely. This log doesn't appear to be relevant; the issue appears to lie in dom0 (I think).

I never have this issue when attempting to launch the actual TemplateVM = 'whonix-ws-14-dvm'.

Expected or desired behavior:

I can use apps in my new disposable whonix 14 vm without it halting prematurely

Actual behavior:

The VM halts unexpectedly.

General notes:

Unfortunately, this is not 100% reproducible. About 5% of the time, my whonix DispVM starts as desired. 95% of the time, it just halts prematurely.

I up'd the qrexec_timeout to 300 to be sure, no change.

I checked /var/cache on the TemplateVM 'whonix-ws-14-dvm', which is 301M, and there is only one tor-browser dir in /var/cache/tb-browser/.tb -- no backup tor-browser dirs.


I have consulted the following relevant documentation:

https://dev.qubes-os.org/projects/core-admin-client/en/latest/manpages/qvm-run.html

I am aware of the following related, non-duplicate issues:

#4918

@andrewdavidwong andrewdavidwong added C: Whonix This issue impacts Qubes-Whonix P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug labels Jun 27, 2019
@andrewdavidwong andrewdavidwong added this to the Release 4.0 updates milestone Jun 27, 2019
@andrewdavidwong
Copy link
Member

Does this happen with non-Whonix DisposableVMs?

I never have this issue when attempting to launch the actual TemplateVM = 'whonix-ws-14-dvm'.

Note that whonix-ws-14-dvm is a DisposableVM Template, not a TemplateVM.

@maltfield
Copy link
Author

Hi Andrew, thanks for the fast response :)

Does this happen with non-Whonix DisposableVMs?

No. I only experience this issue with the whonix DisposableVM.

Note that whonix-ws-14-dvm is a DisposableVM Template, not a TemplateVM.

Noted, thank you.

@marmarek
Copy link
Member

The error is happening on dispvm.cleanup(), so something already caused DispVM to stop.
I think the original reason for not starting is #4918 . See #4918 (comment) and responses for possible workaround and to confirm this.

@maltfield
Copy link
Author

maltfield commented Jun 27, 2019

@marmarek

I up'd the qrexec_timeout to 300 to be sure, no change.

I checked /var/cache on the DisposableVM Template 'whonix-ws-14-dvm', which is 301M, and there is only one tor-browser dir in /var/cache/tb-browser/.tb -- no backup tor-browser dirs.

....unless I'm missing something else from #4918?

@marmarek
Copy link
Member

No, so this indeed looks different.
Try adding -p qvm-run option, you should see command output (and hopefully why it fails).

@maltfield
Copy link
Author

I just timed it. For two runs, the time it takes from initiating the qvm-run command until X are:

notify-send "Domain dispXXXX is starting" = 2-3 seconds
notify-send "Domain dixpXXXX has started" = 22-25 seconds
notify-send "Domain dispXXXX has halted" = 33-37 seconds

@maltfield
Copy link
Author

Try adding -p qvm-run option, you should see command output (and hopefully why it fails).

Sure, here it is:

user@dom0 ~]$ qvm-run -p --dispvm whonix-ws-14-dvm torbrowser
vchan connection timeout
Traceback (most recent call last):
  File "/usr/bin/qvm-run", line 5, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 290, in main
    dispvm.cleanup()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 389, in cleanup
    self.kill()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 125, in kill
    self.qubesd_call(self._method_dest, 'admin.vm.Kill')
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 68, in qubesd_call
    payload_stream)
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 557, in qubesd_call
    return self._parse_qubesd_response(return_data)
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 81, in _parse_qubesd_response
    'Got empty response from qubesd. See journalctl in dom0 for '
qubesadmin.exc.QubesDaemonNoResponseError: Got empty response from qubesd. See journalctl in dom0 for details.

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 252, in _bootstrap
    self.run()
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 126, in copy_stdin
    stream.flush()
BrokenPipeError: [Errno 32] Broken pipe
[user@dom0 ~]$ 

@marmarek
Copy link
Member

vchan connection timeout

This is the problem. Still looks very similar to #4918 . Can you confirm that setting qrexec_timeout was effective? In one terminal try to start a DispVM, and from another terminal use qvm-prefs to read its properties - when you dispXXXX starting notification, call qvm-prefs on that VM and see qrexec_timeout property.

@maltfield
Copy link
Author

Can you confirm that setting qrexec_timeout was effective? In one terminal try to start a DispVM, and from another terminal use qvm-prefs to read its properties - when you dispXXXX starting notification, call qvm-prefs on that VM and see qrexec_timeout property.

Here's the output before running qvm-run on the DisposableVM Template (I made this change to 300 before today, and I've since rebooted; this output confirms its persistence).

[user@dom0 ~]$ qvm-prefs --get whonix-ws-14-dvm qrexec_timeout
300
[user@dom0 ~]$ 

And here's the query after initiating the qvm-run for the individual DispVM

[user@dom0 ~]$ qvm-prefs --get disp7846 qrexec_timeout
300
[user@dom0 ~]$ 

And note from my comment above that I validated that this DispVM halts prematurely before 40 seconds. iirc even the default is 60, yes?

@marmarek
Copy link
Member

Yes, should be 60 sec by default...

Maybe for some reason qrexec_timeout isn't applied at all? Lets go deeper:

strace -f -v -e execve qvm-run -p --dispvm whonix-ws-14-dvm torbrowser

(you may need to install strace in dom0 first)
Check if qrexec-client call has -w parameter and with what value (this is the timeout for vchan connection option).

@maltfield
Copy link
Author

maltfield commented Jun 27, 2019

Ok, I installed strace with qubes-dom0-update strace, and ran the command per your recommendation.

Indeed, it appears to pass the -w 300 along to qrexec-client

[root@dom0 ~]# strace -f -v -e execve qvm-run -p --dispvm whonix-ws-14-dvm torbrowser
execve("/usr/bin/qvm-run", ["qvm-run", "-p", "--dispvm", "whonix-ws-14-dvm", "torbrowser"], ["HOSTNAME=dom0", "SHELL=/bin/bash", "TERM=xterm-256color", "HISTSIZE=1000", "USER=root", "LS_COLORS=rs=0:di=38;5;33:ln=38;"..., "MAIL=/var/spool/mail/root", "PATH=/usr/local/sbin:/usr/local/"..., "PWD=/root", "LANG=en_US.UTF-8", "SSH_ASKPASS=/usr/libexec/openssh"..., "HISTCONTROL=ignoredups", "SHLVL=1", "HOME=/root", "LOGNAME=root", "DISPLAY=:0.0", "XAUTHORITY=/root/.xauthifHdXQ", "_=/usr/bin/strace"]) = 0
strace: Process 8569 attached
[pid  8569] execve("/usr/lib/qubes/qrexec-client", ["/usr/lib/qubes/qrexec-client", "-d", "disp6359", "-t", "-T", "-w", "300", "DEFAULT:QUBESRPC qubes.VMShell+W"...], ["HOSTNAME=dom0", "SHELL=/bin/bash", "TERM=xterm-256color", "HISTSIZE=1000", "USER=root", "LS_COLORS=rs=0:di=38;5;33:ln=38;"..., "MAIL=/var/spool/mail/root", "PATH=/usr/local/sbin:/usr/local/"..., "PWD=/root", "LANG=en_US.UTF-8", "SSH_ASKPASS=/usr/libexec/openssh"..., "HISTCONTROL=ignoredups", "SHLVL=1", "HOME=/root", "LOGNAME=root", "DISPLAY=:0.0", "XAUTHORITY=/root/.xauthifHdXQ", "_=/usr/bin/strace"]) = 0
strace: Process 8570 attached
vchan connection timeout
[pid  8569] +++ exited with 1 +++
[pid  8244] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8569, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
Traceback (most recent call last):
  File "/usr/bin/qvm-run", line 5, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 290, in main
    dispvm.cleanup()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 389, in cleanup
    self.kill()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 125, in kill
    self.qubesd_call(self._method_dest, 'admin.vm.Kill')
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 68, in qubesd_call
    payload_stream)
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 557, in qubesd_call
    return self._parse_qubesd_response(return_data)
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 81, in _parse_qubesd_response
    'Got empty response from qubesd. See journalctl in dom0 for '
qubesadmin.exc.QubesDaemonNoResponseError: Got empty response from qubesd. See journalctl in dom0 for details.

[pid  8570] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=8570, si_uid=0} ---
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 252, in _bootstrap
    self.run()
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 126, in copy_stdin
    stream.flush()
BrokenPipeError: [Errno 32] Broken pipe
[pid  8570] +++ exited with 1 +++
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=8570, si_uid=0, si_status=1, si_utime=0, si_stime=0} ---
+++ exited with 1 +++
[root@dom0 ~]# 

@marmarek
Copy link
Member

I assume there wasn't 5 minutes before that failure, right?

@marmarek
Copy link
Member

Another idea - check xl dmesg, maybe that DispVM was killed by Xen for some reason? That would be weird, whonix-ws-14-dvm itself works for you, but still lets check.
I'm looking at the code and trying to figure out how it's possible you've got vchan connection timeout message before elapsing 60s. The function handling timeout is here, called from here.

To get some deeper info, lets add select to that strace call, and timestamps:

strace -f -tt -v -e execve,select qvm-run -p --dispvm whonix-ws-14-dvm torbrowser

You should see somewhere a call like select(5, [4], NULL, NULL, {tv_sec=(almost 300), tv_usec=(something)} = ....
Interesting values are tv_sec - is it really the right timeout. And return value of this call - should be 0 on timeout. And should be after 5 minutes in that case...
Is your clock ok? Isn't it super speeding?

@maltfield
Copy link
Author

I assume there wasn't 5 minutes before that failure, right?

It took 36 seconds between the time I executed the qvm-run command and the time that I was notified that it "halted" (it was "started" at 24 seconds).

It does take 5 minutes (actually, 5 minutes and 30 seconds) for the qvm-run command to finally terminate. It just sits for many minutes without output between the "Got empty response" line and the "select resumed" line.

strace -f -tt -v -e execve,select qvm-run -p --dispvm whonix-ws-14-dvm torbrowser

Here you go:

[user@dom0 ~]$ strace -f -tt -v -e execve,select qvm-run -p --dispvm whonix-ws-14-dvm torbrowser
13:53:03.169321 execve("/usr/bin/qvm-run", ["qvm-run", "-p", "--dispvm", "whonix-ws-14-dvm", "torbrowser"], ["XDG_VTNR=1", "XDG_SESSION_ID=2", "HOSTNAME=dom0", "XDG_GREETER_DATA_DIR=/var/lib/li"..., "GLADE_PIXMAP_PATH=:/usr/share/gl"..., "TERM=xterm-256color", "SHELL=/bin/bash", "XDG_MENU_PREFIX=xfce-", "VTE_VERSION=4603", "HISTSIZE=1000", "WINDOWID=92276927", "USER=user", "LS_COLORS=rs=0:di=38;5;33:ln=38;"..., "XDG_SESSION_PATH=/org/freedeskto"..., "GLADE_MODULE_PATH=:/usr/lib64/gl"..., "XDG_SEAT_PATH=/org/freedesktop/D"..., "SESSION_MANAGER=local/unix:@/tmp"..., "XDG_CONFIG_DIRS=/etc/xdg", "PATH=/usr/local/bin:/usr/bin:/bi"..., "MAIL=/var/spool/mail/user", "DESKTOP_SESSION=xfce", "XDG_SESSION_TYPE=x11", "PWD=/home/user", "LANG=en_US.utf8", "GDMSESSION=xfce", "SSH_ASKPASS=/usr/libexec/openssh"..., "HISTCONTROL=ignoredups", "XDG_SEAT=seat0", "HOME=/home/user", "SHLVL=3", "XDG_SESSION_DESKTOP=xfce", "LOGNAME=user", "XDG_DATA_DIRS=/usr/local/share:/"..., "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_RUNTIME_DIR=/run/user/1000", "DISPLAY=:0.0", "GLADE_CATALOG_PATH=:/usr/share/g"..., "XDG_CURRENT_DESKTOP=XFCE", "COLORTERM=truecolor", "XAUTHORITY=/var/run/lightdm/user"..., "_=/usr/bin/strace"]) = 0
strace: Process 6249 attached
[pid  6249] 13:53:25.747197 execve("/usr/lib/qubes/qrexec-client", ["/usr/lib/qubes/qrexec-client", "-d", "disp3262", "-t", "-T", "-w", "300", "DEFAULT:QUBESRPC qubes.VMShell+W"...], ["XDG_VTNR=1", "XDG_SESSION_ID=2", "HOSTNAME=dom0", "XDG_GREETER_DATA_DIR=/var/lib/li"..., "GLADE_PIXMAP_PATH=:/usr/share/gl"..., "TERM=xterm-256color", "SHELL=/bin/bash", "XDG_MENU_PREFIX=xfce-", "VTE_VERSION=4603", "HISTSIZE=1000", "WINDOWID=92276927", "USER=user", "LS_COLORS=rs=0:di=38;5;33:ln=38;"..., "XDG_SESSION_PATH=/org/freedeskto"..., "GLADE_MODULE_PATH=:/usr/lib64/gl"..., "XDG_SEAT_PATH=/org/freedesktop/D"..., "SESSION_MANAGER=local/unix:@/tmp"..., "XDG_CONFIG_DIRS=/etc/xdg", "PATH=/usr/local/bin:/usr/bin:/bi"..., "MAIL=/var/spool/mail/user", "DESKTOP_SESSION=xfce", "XDG_SESSION_TYPE=x11", "PWD=/home/user", "LANG=en_US.utf8", "GDMSESSION=xfce", "SSH_ASKPASS=/usr/libexec/openssh"..., "HISTCONTROL=ignoredups", "XDG_SEAT=seat0", "HOME=/home/user", "SHLVL=3", "XDG_SESSION_DESKTOP=xfce", "LOGNAME=user", "XDG_DATA_DIRS=/usr/local/share:/"..., "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "XDG_RUNTIME_DIR=/run/user/1000", "DISPLAY=:0.0", "GLADE_CATALOG_PATH=:/usr/share/g"..., "XDG_CURRENT_DESKTOP=XFCE", "COLORTERM=truecolor", "XAUTHORITY=/var/run/lightdm/user"..., "_=/usr/bin/strace"]) = 0
strace: Process 6250 attached
[pid  6249] 13:53:25.899974 select(5, [4], NULL, NULL, {tv_sec=299, tv_usec=999671} <unfinished ...>
[pid  6250] 13:53:25.910605 select(1, [0], [], [], NULL <unfinished ...>
[pid  6249] 13:58:25.985701 <... select resumed> ) = 0 (Timeout)
vchan connection timeout
[pid  6249] 13:58:25.996219 +++ exited with 1 +++
[pid  5974] 13:58:25.997111 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6249, si_uid=1000, si_status=1, si_utime=0, si_stime=2} ---
Traceback (most recent call last):
  File "/usr/bin/qvm-run", line 5, in <module>
    sys.exit(main())
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 290, in main
    dispvm.cleanup()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 389, in cleanup
    self.kill()
  File "/usr/lib/python3.5/site-packages/qubesadmin/vm/__init__.py", line 125, in kill
    self.qubesd_call(self._method_dest, 'admin.vm.Kill')
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 68, in qubesd_call
    payload_stream)
  File "/usr/lib/python3.5/site-packages/qubesadmin/app.py", line 557, in qubesd_call
    return self._parse_qubesd_response(return_data)
  File "/usr/lib/python3.5/site-packages/qubesadmin/base.py", line 81, in _parse_qubesd_response
    'Got empty response from qubesd. See journalctl in dom0 for '
qubesadmin.exc.QubesDaemonNoResponseError: Got empty response from qubesd. See journalctl in dom0 for details.

[pid  6250] 13:58:31.567652 <... select resumed> ) = 1 (in [0])
[pid  6250] 13:58:31.569343 --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=6250, si_uid=1000} ---
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 252, in _bootstrap
    self.run()
  File "/usr/lib64/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/site-packages/qubesadmin/tools/qvm_run.py", line 126, in copy_stdin
    stream.flush()
BrokenPipeError: [Errno 32] Broken pipe
[pid  6250] 13:58:31.585245 +++ exited with 1 +++
13:58:31.585353 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=6250, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
13:58:31.613988 +++ exited with 1 +++
[user@dom0 ~]$ 

Another idea - check xl dmesg, maybe that DispVM was killed by Xen for some reason?

I did a xl dmesg -c to clear it, then executed the qvm-run. After it "halted", I checked xl dmesg, and got the following:

[user@dom0 ~]$ xl dmesg
(XEN) p2m_pod_demand_populate: Dom10 out of PoD memory! (tot=102416 ents=447219 dom10)
(XEN) domain_crash called from p2m-pod.c:1218
(XEN) Domain 10 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-4.8.5-6.fc25  x86_64  debug=n   Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    0010:[<ffffffff81869dae>]
(XEN) RFLAGS: 0000000000050206   CONTEXT: hvm guest (d10v0)
(XEN) rax: 00005f2464bbd000   rbx: 0000000000001000   rcx: 0000000000001000
(XEN) rdx: 0000000000001000   rsi: 00005f2464bbc000   rdi: ffff88806cde7000
(XEN) rbp: 0000000000001000   rsp: ffffc90000d53cc0   r8:  ffffc90000d53e80
(XEN) r9:  0000000000000000   r10: 0000000000001000   r11: ffff88808b3b7270
(XEN) r12: ffff88806cde8000   r13: 0000000000001000   r14: ffff88806cde7000
(XEN) r15: ffffc90000d53e48   cr0: 0000000080050033   cr4: 00000000003606f0
(XEN) cr3: 0000000097c42006   cr2: 000070140fccd024
(XEN) fsb: 00007e6f352b0800   gsb: ffff8880f9c00000   gss: ffff8880f9c00000
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0018   cs: 0010
[user@dom0 ~]$ 

Is your clock ok? Isn't it super speeding?

I haven't noticed any issues with time drift on my system's time clock. And I'm not making any modifications to my system's cpu clock. fwiw, I have an Asus ux305 laptop.

@maltfield
Copy link
Author

maltfield commented Jun 28, 2019

ok, so I increased the "initial memory" of the DispVM Template = whonix-ws-14-dvm to "1000M" (from, iirc, 400M), and I was successfully able to start torbrowser in a new whonix DispVM via the qvm-run command three times in a row!

Thanks for your assistance in this!

But I still don't understand why it wouldn't ever encounter issues when I attempted to start the DispVM Template itself..

Is 400M supposed to be generally sufficient for a whonix DispVM? Also, the max is 4000M (note the extra zero), so I'd expect it to grow to 4000M = 4G without crashing, right? Is this a bug?

My laptop has 8G of RAM, and I conducted these tests this morning with 5 VMs running. Just checking my RAM usage (after the tests) without the whonix DispVM running, xl info shows total_memory: 8091 and free_memory: 154. I'm assuming that this "free_memory" line excludes buffers/cache, and is not the best metric (?)

Edit: I also didn't get any notify-send alerts indicating that the VM couldn't start due to insufficient memory, which would be nice for UX if this isn't a bug..

@brendanhoar
Copy link

See also #4135

@marmarek
Copy link
Member

Those number (initial 400MB, max 4000MB) should be ok. Do you use some custom kernel that could lack balloon driver? Or some custom kernel parameters? Otherwise, balloon driver in linux kernel should give back the extra 3.6GB to Xen, before starting any userspace process (systemd etc). And you clearly see those processes starting, no?

@maltfield
Copy link
Author

maltfield commented Jun 29, 2019

Those number (initial 400MB, max 4000MB) should be ok. Do you use some custom kernel that could lack balloon driver? Or some custom kernel parameters

I have made no modifications to the kernel or kernel parameters.

Otherwise, balloon driver in linux kernel should give back the extra 3.6GB to Xen, before starting any userspace process (systemd etc). And you clearly see those processes starting, no?

Sorry, I know nothing about this Xen balloon driver. How can I trace the memory given to the VM (and returned to Xen)? Is there some log file I can watch to better understand what's happening?

you clearly see those processes starting, no?

I dropped the initial RAM down to 400M and attempted to start a new whonix DispVM to answer this question. I tail'd the guest console log, and I confirmed that it made it all the way to the final host login: line before I was alerted that the system "halted". If I scroll up in that guest console log, yes, I can confirm that many userspace processes (ie: whonixcheck, OpenVPN service, systemd, etc) started prior to the halt.

Interestingly, when I was searching for 'systemd' in the guest console log, I saw these lines:

[    6.939652] systemd-journald[203]: Received request to flush runtime journal from PID 1
[.[0;32m  OK  .[0m] Started udev Coldplug all Devices.
[.[0;32m  OK  .[0m] Started Flush Journal to Persistent Storage.
[.[0;1;31mFAILED.[0m] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
         Starting Apply Kernel Variables...
         Mounting Configuration File System...
         Starting Qubes DB agent...
[.[0;32m  OK  .[0m] Mounted Configuration File System.
[.[0;32m  OK  .[0m] Started udev Kernel Device Manager.
[.[0;32m  OK  .[0m] Started Qubes DB agent.

That may or may not be an issue. Indeed, I also found a line suggesting that the Xen balloon driver was (attempted to be) loaded.

[    0.120014] xen:balloon: Initialising balloon driver

@CrushedPixl
Copy link

I encountered this issue as well and was able to fix it by closing some VMs in order to free up memory.

@kocmo
Copy link

kocmo commented Jan 27, 2021

Qubes 4.0.3, Xen-4.8.5-28.fc25.

I ran into similar problem after switching from Ubuntu bionic template to Ubuntu focal by unman. Dependent VMs with large private storage crash on first boot under new template with error:

dom0 qubesd[28615]: Start failed: qrexec-daemon startup failed: Connection to the VM failed

In 'xl dmesg' output I saw:

(XEN) p2m_pod_demand_populate: Dom12 out of PoD memory! (tot=102416 ents=186848 dom12)
(XEN) domain_crash called from p2m-pod.c:1218

Trying different VM kernels didn't help. Affected VMs all have memory balancing enabled and balloon driver loading, 400 MB initial RAM, 4000 MB max RAM. What they have in common is that they have private storage max size of at least 20-30 GB; other VMs with less than 10 GB private storage don't crash with new template.

What seems to help is slightly increasing the VM private storage max size via Qubes Manager, and then booting the affected VM once with large initial RAM assignment, 2-3 GB. After it boots successfully once, I can lower initial RAM assignment back to 400 MB, and VMs will boot successfully.

@marmarek
Copy link
Member

Similar issue is fixed in R4.1 here:

I guess I need to backport those too.

@maltfield
Copy link
Author

fwiw, when a DispVM halts immediately after starting, I usually just launch it again with xterm. The chances of it halting prematurely with xterm are much less than with gnome-terminal.

@andrewdavidwong andrewdavidwong added the eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) label Aug 5, 2023
@github-actions
Copy link

github-actions bot commented Aug 5, 2023

This issue is being closed because:

If anyone believes that this issue should be reopened and reassigned to an active milestone, please leave a brief comment.
(For example, if a bug still affects Qubes OS 4.1, then the comment "Affects 4.1" will suffice.)

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Whonix This issue impacts Qubes-Whonix eol-4.0 Closed because Qubes 4.0 has reached end-of-life (EOL) P: default Priority: default. Default priority for new issues, to be replaced given sufficient information.
Projects
None yet
Development

No branches or pull requests

6 participants