Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rocket.Chat vs. NodeJS 8.11.1 (or rather > 8.9.4): Random SEGV (segmentation violation) #10331

Closed
TwizzyDizzy opened this issue Apr 4, 2018 · 13 comments
Assignees

Comments

@TwizzyDizzy
Copy link

TwizzyDizzy commented Apr 4, 2018

Description:

This is happening when running Rocket.Chat 0.61.2 as well as 0.63.0 on NodeJS 8.11.1. Both versions don't exhibit this behaviour when run on NodeJS 8.9.4. I had NodeJS 8.11.1 and Rocket.Chat running for a while on my testing instance, which didn't exhibit this behaviour. This leads me to think that the suspect is NodeJS 8.11.1 in combination with:

  • either the load of the Rocket.Chat server
  • or the data in MongoDB

Server Setup Information:

  • Version of Rocket.Chat Server: 0.63.0 & 0.61.2 (this may affect other versions)
  • Operating System: Oracle Linux 7
  • Deployment Method(snap/docker/tar/etc): tar
  • Number of Running Instances: 1
  • DB Replicaset Oplog: -
  • Node Version: 8.11.1
  • mongoDB Version: 2.6.12

Steps to Reproduce:

I can only guess here:

  • run a decently sized Rocket.Chat (in terms of amount of users) server on NodeJS 8.11.1

Expected behavior:

No SEGV

Actual behavior:

SEGV. Restart (due to systemd unit definition) of Rocket.Chat at random intervals

Relevant logs:

strace of the NodeJS process is available, but I will only share it as a last resort with one of the Rocket.Chat developers, as it possibly contains private/sensitive information.

Last lines in strace before SEGV:

read(12, "\27\3\3\0\265\252\244\276\253\262\345\32\335\230b\255\311H\331p\2200\10\245\222.\26\313\2035\210\327"..., 16384) = 9462
rt_sigprocmask(SIG_SETMASK, [], [], 8)  = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=0} ---
+++ killed by SIGSEGV +++

auditd-log of failing node processes:

ANOM_ABEND: Triggered when a processes ends abnormally (with a signal that could cause a core dump, if enabled).

root@chat01 [/var/log] # ausearch --comm node
----
time->Wed Apr  4 14:35:17 2018
type=ANOM_ABEND msg=audit(1522845317.790:75): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=1043 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:47:00 2018
type=ANOM_ABEND msg=audit(1522846020.846:1164): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=4595 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:49:03 2018
type=ANOM_ABEND msg=audit(1522846143.096:1227): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5458 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:50:34 2018
type=ANOM_ABEND msg=audit(1522846234.113:1269): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5562 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:57:58 2018
type=ANOM_ABEND msg=audit(1522846678.970:1448): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5643 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:58:05 2018
type=ANOM_ABEND msg=audit(1522846685.473:1460): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5878 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 14:59:29 2018
type=ANOM_ABEND msg=audit(1522846769.954:1477): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=5929 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:02:32 2018
type=ANOM_ABEND msg=audit(1522846952.269:1538): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=6007 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:23:04 2018
type=ANOM_ABEND msg=audit(1522848184.122:2496): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=9055 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:35:35 2018
type=ANOM_ABEND msg=audit(1522848935.572:3405): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11501 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:40:33 2018
type=ANOM_ABEND msg=audit(1522849233.904:3470): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=11899 comm="node" reason="memory violation" sig=11
----
time->Wed Apr  4 15:43:02 2018
type=ANOM_ABEND msg=audit(1522849382.823:3519): auid=4294967295 uid=990 gid=987 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=12040 comm="node" 

/var/log/messages (notice, that the times are identical to the auditd-logs above)

root@chat01 [~] # grep SEGV /var/log/messages
Apr  4 14:35:17 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:47:00 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:49:03 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:50:34 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:57:59 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:58:05 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 14:59:29 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:02:32 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:23:04 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:35:35 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:40:33 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Apr  4 15:43:02 chat01 systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
@geekgonecrazy
Copy link
Contributor

@TwizzyDizzy weird.. 0.63.0 in combo with Node.js 8.11.1 should have actually solved this seg fault

https://github.com/meteor/meteor/blob/devel/History.md#v1611-2018-04-02

According to the meteor release which we updated to for the 0.63.0 release... Node.js 8.11.1 actually solved the seg fault.

So for sure 0.63.0 in combo with Node.js 8.11.1 gave the seg fault and not some other combo?

@TwizzyDizzy
Copy link
Author

TwizzyDizzy commented Apr 4, 2018

Hi Aaron,

yes, I just replicated this: Rocket.Chat 0.63.0 vs. NodeJS 8.11.1.

  • Upgrade to NodeJS 8.11.1
  • do a clean (except for data in MongoDB) install via ansible
  • Server gets killed after some (not always the same) time.

... downgrade to NodeJS 8.9.4: no such behaviour anymore.

Cheers
Thomas

@ghost
Copy link

ghost commented Apr 4, 2018

Same here. Still crashing. Rocketchat 0.63.0 / NodeJS 8.11.1. Downgrading NodeJS to 8.9.4 solves it.

cheers
t.

@geekgonecrazy
Copy link
Contributor

@rodrigok @sampaiodiego thoughts? This seems to be doing the complete opposite of what upgrading to 8.11.1 was supposed to give us

@ghost
Copy link

ghost commented Apr 4, 2018

Well they said the patch which should solve the problem should be in 8.11.1. Maybe it's not? Here the nodejs issue for reference. nodejs/node#19274

thanks and cheers

@geekgonecrazy
Copy link
Contributor

meteor/meteor#9783 (comment) yup looks like they didn't include the segfault in 8.11.1 instead it might be in 8.11.2 🙄

@trstn70
Copy link

trstn70 commented Apr 5, 2018

How do I downgrade the NodeJs Version within the Rocketchat server (snap)? (for dummies?) ..

@sampaiodiego
Copy link
Member

I don't think you can @trstn70 .. but @geekgonecrazy released a fix yesterday, please try running sudo snap refresh rocketchat-server

@graywolf336
Copy link
Contributor

graywolf336 commented Apr 7, 2018

Closing this Due to the merging of #10351 and the release of Rocket.Chat v0.63.1 :) @geekgonecrazy informs me that a snaps release will follow suit in a day or so. :D

@gu1ll0me
Copy link

gu1ll0me commented Apr 7, 2018

Still crashing with 0.63.1 and Node 8.11.1 here. Please re-open.

Revert to 8.9.4 solve the problem.

We are not using the Snap release.

@geekgonecrazy
Copy link
Contributor

If you are using 8.11.1 please downgrade node version to 8.9.4. Unfortunately until node.js releases another hot fix... We have no other choice. In snap installs we just downgraded to keep people from being effected. Docker images are already downgraded. It's only manual installs left that you have to downgrade nodejs if you did upgrade

@geekgonecrazy
Copy link
Contributor

geekgonecrazy commented Apr 7, 2018

Also updated release notes with this note
https://forums.rocket.chat/t/rocket-chat-0-63-0-released-updated-for-0-63-1/479

@TwizzyDizzy TwizzyDizzy changed the title Rocket.Chat vs. NodeJS 8.11.1: Random SEGV (segmentation violation) Rocket.Chat vs. NodeJS 8.11.1 (or rather > 8.9.4): Random SEGV (segmentation violation) Apr 24, 2018
@TwizzyDizzy
Copy link
Author

NodeJS 8.11.2 is out. I've just upgraded my production instance and the behaviour described in this issue does not occur anymore. This is why I am closing this issue.

Cheers
Thomas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants