Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extra unref happening on net_context #4027

Closed
zephyrbot opened this issue Sep 6, 2017 · 3 comments
Closed

extra unref happening on net_context #4027

zephyrbot opened this issue Sep 6, 2017 · 3 comments
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@zephyrbot
Copy link
Collaborator

zephyrbot commented Sep 6, 2017

Reported by Geoff Gustafson:

I'm seeing a problem where I get a net_context in a tcp_accepted_cb and I call net_context_ref on it so that I can keep it around and use it later. But if I instrument Zephyr code and monitor alls refs and unrefs, I see that it gets unref'd down to 0 even though I've never unref'd it. If I don't ref it I see it get unref'd down to -1. Then I changed things to print the caller function for every ref/unref. I see that tcp_established is calling unref twice in a row in these lines near the bottom of the function:

	if (sys_slist_is_empty(&context->tcp->sent_list)
	    && context->tcp->fin_rcvd
	    && context->tcp->fin_sent) {
		net_context_unref(context);
	}

This code looks to me like it might be at issue because it drops the reference and yet doesn't change any other state to indicate to the stack that it's been closed. It may be that it's expecting to be the last one to hold a reference, but in my case that's not true.

When I run samples/net/echo_server I don't see this problem. I suspect that's because the sample handles socket closure immediately in tcp_received. But in my more complex case with ZJS - JavaScript running in the main thread that I need to sync with, I have to hold on to the context and defer the work to the main thread.

Talking to [~ajross71] in IRC, he suggested that maybe elsewhere the socket should be getting moved to a new state to prevent re-entry into tcp_established, and that's where the bug is.

Note: I'm really working with 1.8 at the moment but I don't see anything that would have changed this in master. The bug I'm working on is the last thing holding up my 0.4 release, and after that we'll try to update to the latest Zephyr.

(Imported from Jira ZEP-2598)

@zephyrbot
Copy link
Collaborator Author

by Geoff Gustafson:

Sorry, I should have explained that my code is basically an echo server test too, just more complex because JS is involved.

If I do this from Linux:
$ nc 192.168.1.99 4242
it works fine, and I can go back and forth echoing data.

It's when I do this that it fails:
$ echo "foo" | nc 192.168.1.99 4242

That is, something about the socket getting closed so quickly after the data is sent. When I look at the packet trace, I see:
Client SYN (Linux)
Server SYN/ACK (Zephyr)
Client ACK
Client PSH/ACK w/ the data "foo"
Client FIN/ACK
Server ACK to data
Server FIN/ACK
Client ACK

I tried doing an extra unpaired net_context_ref in my code. It gets farther, but then after the above packet trace I see ack_timeout() get called after a few seconds and unref the context again itself. So it still goes to zero and gets marked as not "in use". I guess it doesn't see that last Client ACK for some reason?

@zephyrbot zephyrbot added priority: medium Medium impact/importance bug area: Networking bug The issue is a bug, or the PR is fixing a bug labels Sep 23, 2017
@tbursztyka
Copy link
Collaborator

Can you reproduce that on latest master? Since your report, there had been many fixes applied on TCP.

@laperie
Copy link
Collaborator

laperie commented Dec 4, 2017

Closing as fixed. Please reopen if the bug reappears (it shoudn't)

@laperie laperie closed this as completed Dec 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants