-
-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIGSEGV in RepReq's rep0 recv - use after free #1241
Comments
Thanks, will investigate. |
Can you see if this is reproducible with tonight's latest commits? I think there's a chance that this might be another manifestation of a problem I resolved in TCP. |
Unfortunately I can, just tried with the latest master, and I am getting the same backtraces as above. |
Thanks. I will look further into it. |
There's at least one bug in your code -- a use-after-free -- in your program. It's not clear to me yet that there is a relationship with this particular crash though. I'll comment on your gist. |
Actually I may have misread the code. |
Your backtrace data above is not quite complete, since it doesn't include the arguments, which is unfortunate. It looks like this might be a race where the pipe is closed (and so members of the pipe itself are discarded) but the pipe still is on a per-socket recvpipes list. This might actually also be a bit of a race as we take it off that list. All of this is supposed to be done with the mutex for the socket held, so it should be safe. But something isn't right. |
Yep. The rep0_pipe_close doesn't properly verify that the pipe isn't busy receiving. |
Are you using NNG 1.2.4 or HEAD (which is 1.3.x?) The rust crash look like it is from 1.2.4. There are rather large changes in 1.3.x. |
Ah thought it might be something like that but the code was unfamiliar to me and I had a feeling you would identify the issue much faster. On the rust front they currently point at 1.2.4 but I pointed nng-sys at HEAD to ensure that I was using the latest version before reporting that one. So unless I typoed both versions were tested against HEAD. |
Is there any special trick to triggering the crash in your test program? I can't seem to reproduce it. Something here isn't adding up for me. |
Being a race condition, I tend to find if it does not trigger first time i have to start the server again. Its a bit of a gamble but i start the server, leave the client to run for about 3 seconds then kill it. After a few attempts this seems to trigger it. If you still cant trigger it i’ll try and script a loop to trigger it. |
Are you killing the server, or the client? |
The client, which will then cause the server to crash. Hence why your reasoning above seems accurate. |
Ok, I just got it. |
Oh this is in a totally different place though. |
Hmm, different to both of the stack traces above, could that be due to recent changes in master? Would you like me to run again against the latest HEAD? |
Yes please. I'm seeing it on the send side not the receive side. |
Okay will try it now and get back to you asap. |
Interesting... different errors. Testing on Linux? I'm actually getting SIGPIPE on send which I thought I should not, but I'm testing under WSL. |
I wonder if the debugger or WSL are changing the signal disposition somehow. I'm also trying to test under the sanitizer. |
Yeah Linux is where i dev these sort of things, as they end up in docker, etc. Hmm, odd, I wonder if I would get the same issue if i ran under WSL?! Which sanitizer are you running, I could try with that over gdb, but WSL will be a bit of a pain as ill need to setup a windows dev vm. |
I was using the address sanitizer. For whatever reason your code falls down hard with the memory sanitizer, and I wasn't able to convince it to generate output that would help me debug that. |
Your crash indicates that the aio is garbage. I'm not seeing that at all on my side, but I think we're just seeing different variations of corruption. |
I am passing MSG_NOSIGNAL, but it didn't get honored. Don't know why not. Argh. |
I am not getting anything from the address sanitizer but i am getting the strcmp error with the memory sanitizer |
I couldn't figure out where the strcmp error was -- the sanitizer output wasn't very informative, and I was unable to set any useful break points (notably __msan_warning and __msan_warning_noreturn) in the debugger -- or rather said entry points did not fire. |
I'm also stuck with WSL 1, because I'm not really prepared to run the insider's version on my primary desktop. |
Meanwhile I'm firing up a HyperV guest with Ubuntu 20. We'll see if that goes any better. |
And looking at the sanitizer output, I think it's wrong. I think it's confused because we don't do a string copy to initialize the url->u_scheme, but do a character by character copy. I'll probably change this just to silence it. |
Interesting, gonna make that change too and see if it gets further, just out of curiosity. |
Trusty valgrind seems more useful :) |
Yep. that was quite helpful. |
I think I have a fix. I also think respondent suffers the same problem. |
This also affects the respondent protocol. Examination of the other protocols did not turn up any evidence of the same issue.
Please give this branch a try: https://github.com/nanomsg/nng/pull/1246/files I think it probably solves the problem. |
That looks like it fixes it, just tried it ~10 times and could not get it to crash, tried it on the rust version too and could not get it to crash either. So Iooks like that has fixed it, thanks for sorting that out. |
As briefly discussed in #1240 there appears to be a bug in the current implementation that results in a SIGSEGV when a socket is closed but the resend timeout fired at least once. Or that is my current theory on it at least.
SIGSEGV from
C
implementation:SIGSEGV from
Rust
version:The text was updated successfully, but these errors were encountered: