Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPP does not shut-down the link properly #67627

Closed
RobMeades opened this issue Jan 15, 2024 · 11 comments · Fixed by #67731
Closed

PPP does not shut-down the link properly #67627

RobMeades opened this issue Jan 15, 2024 · 11 comments · Fixed by #67731
Assignees
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@RobMeades
Copy link

RobMeades commented Jan 15, 2024

Transferred here from https://discord.com/channels/720317445772017664/733037635194585148/1196453166590476439.

PR #61147 made a change to the PPP driver so that it calls net_if_carrier_off() inside ppp_stop():

static int ppp_stop(const struct device *dev)
{
        struct ppp_driver_context *context = dev->data;

        net_if_carrier_off(context->iface);
        context->modem_init_done = false;
        return 0;
}

Unfortunately, this means the PPP shut-down process cannot work correctly, since fsm.c ppp_send_pkt() checks that the carrier is up in order to send packets, packets which need to be exchanged in order to run through the shut-down process with the PPP peer.

	if (!net_if_is_carrier_ok(iface)) {
		return -ENETDOWN;
	}

So after calling net_if_down() on a PPP link you end up with something like this (noting my debug prints preceded by ###):

13:12:44 ### EVENT 0xd0010001
13:12:44 D: (net_mgmt): [0x20001790] phase RUNNING (4) => TERMINATE (5) (lcp_close():166)
13:12:44 D: (net_mgmt): [LCP/0x200017c8] Current state OPENED (9)
13:12:44 D: (net_mgmt): [IPCP/0x20001898] Current state OPENED (9)
13:12:45 D: (net_mgmt): [IPCP/0x20001898] state OPENED (9) => STARTING (1) (ppp_fsm_lower_down():282)
13:12:45 D: (net_mgmt): [0x20001790] Proto IPv4 (0x0021) down (0)
13:12:45 D: (net_mgmt): [IPCP/0x20001898] Current state STARTING (1)
13:12:45 D: (net_mgmt): [IPCP/0x20001898] state STARTING (1) => INITIAL (0) (ppp_fsm_close():240)
13:12:45 D: (net_mgmt): [PAP/0x20001970] Current state OPENED (9)
13:12:45 D: (net_mgmt): [PAP/0x20001970] state OPENED (9) => STARTING (1) (ppp_fsm_lower_down():282)
13:12:45 D: (net_mgmt): [0x20001790] phase TERMINATE (5) => DEAD (0) (ppp_link_down():136)
13:12:45 D: (net_mgmt): [0x20001790] phase DEAD (0) => ESTABLISH (1) (lcp_down():182)
13:12:45 ### [LCP/0x200017c8] fsm.c, entered ppp_send_pkt()
13:12:45 ### [LCP/0x200017c8] fsm.c, ppp_send_pkt() checks net_if_is_carrier_ok() and it returns false
13:12:45 D: (net_mgmt): [LCP/0x200017c8] state OPENED (9) => CLOSING (4) (terminate():215)
13:12:45 ### EVENT 0xd0010003
...
13:12:54 D: (sysworkq): [LCP/0x200017c8] Current state CLOSING (4)
13:12:54 D: (sysworkq): ### [LCP/0x200017c8]  fsm.c, entered ppp_send_pkt()
13:12:55 D: (sysworkq): ### [LCP/0x200017c8] fsm.c, ppp_send_pkt() checks net_if_is_carrier_ok() and it returns false
13:13:04 D: (sysworkq): [LCP/0x200017c8] Current state CLOSING (4)
13:13:04 D: (sysworkq): [LCP/0x200017c8] state CLOSING (4) => CLOSED (2) (ppp_fsm_timeout():136)
13:13:04 D: (sysworkq): [0x20001790] phase ESTABLISH (1) => DEAD (0) (ppp_link_terminated():123)
13:13:04 D: (sysworkq): [0x20001790] Link terminated

The link just times out and the PPP peer knows nothing about it.

If I comment out the call to net_if_carrier_off() in ppp_stop() then a proper link termination, with the peer, is performed.

To Reproduce

  • Make a PPP connection to something.
  • Try to terminate that connection and watch the process happen (e.g. with debug on) so that you can tell the difference between a timed-out termination and a true peer-to-peer termination.

Impact
The PPP peer is left hanging, which mean it won't start up again the next time you try to connect to it unless you've power- cycled or reset it. In other words, once you have disconnected PPP you can never connect it again.

Version

  • Zephry 3.4.99 (as arrives with nRFConnect 2.5.0), but headrev Zephyr seems actually worse, in that it now also calls uart_rx_disable() for the asynchronous UART case (which I am using)...!?
@RobMeades RobMeades added the bug The issue is a bug, or the PR is fixing a bug label Jan 15, 2024
@jukkar
Copy link
Member

jukkar commented Jan 15, 2024

@bjarki-trackunit please also take a look if you have time.

@bjarki-andreasen
Copy link
Collaborator

I don't believe the net_if_down() or net_if_up() should directly control the carrier, which is what is happening here (ppp_stop is called from net_if_l2(iface)->enable(iface, false) in net_if.c.

Maybe @jhirsi can expand on the issue?

@jhirsi
Copy link
Contributor

jhirsi commented Jan 16, 2024

@RobMeades, a good catch! I was able to repro that with NCS modem_shell as well when initiating a close from modem_shell side by using ppp down mosh command. For me it is fine if carrier_on/off are both removed from "ppp uart driver" (ppp.c). There was a double ppp carrier ON when NET_EVENT_IF_DOWN after a PR #59124 but we are nowadays counting on NET_EVENT_PPP_PHASE_RUNNING/NET_EVENT_PPP_PHASE_DEAD on modem_shell side (instead of carrier on/off).

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Jan 16, 2024

We seem to agree on the issue :) The solution should be to remove net_if_carrier_on() and net_if_carrier_off() calls from the ppp_start() and ppp_stop() functions, since the flag NET_IF_LOWER_UP (set/cleared with net_if_carrier_on/off) is set by default. See:

.flags = {BIT(NET_IF_LOWER_UP)}, \
and
.flags = {BIT(NET_IF_LOWER_UP)}, \

We clear the flag in the net if init function used by cellular modems, see

static void modem_ppp_ppp_api_init(struct net_if *iface)
{
const struct device *dev = net_if_get_device(iface);
struct modem_ppp *ppp = (struct modem_ppp *)dev->data;
net_ppp_init(iface);
net_if_flag_set(iface, NET_IF_NO_AUTO_START);
net_if_carrier_off(iface);
if (ppp->init_iface != NULL) {
ppp->init_iface(iface);
}
ppp->iface = iface;
}

@RobMeades
Copy link
Author

That sounds great: while we're here, headrev Zephyr's ppp_stop() also has this line in it:

#if defined(CONFIG_NET_PPP_ASYNC_UART)
	uart_rx_disable(context->dev);
#endif

Since I am using the asynchronous UART interface this would cause the same problem: could it be removed also?

@bjarki-andreasen
Copy link
Collaborator

@tomi-font Hi, can you help with this issue? I think you added the call to uart_rx_disable(context->dev); last month, it may be related to this issue :)

@rlubos
Copy link
Contributor

rlubos commented Jan 16, 2024

My 2 cents here. For me the PPP driver does not seem to be the problem here (if the PPP L2 tells the driver to stop it's operation by calling stop() - what's wrong with disabling underlying UART peripheral?), but rather the fact that PPP L2 disables the driver, while still expecting from it to operate (to perform PPP teardown).

It's probably even more confusing, given the PPP interface management logic is scattered between enable() (https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/l2/ppp/ppp_l2.c#L220) and net_mgmt registered handler (https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/l2/ppp/ppp_l2.c#L373), but to me it seems that PPP enable() should not call the driver's stop() rightaway, but rather initiate the teardown process (which seems to currently be handled by the net_mgmt handler), and eventually shut down the driver once done (maybe as a part of the cleanup mentioned here: https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/net/l2/ppp/link.c#L121, but maybe there is a better place to do that, I'm far from being PPP expert).

@henrikbrixandersen
Copy link
Member

@rlubos, @jukkar Please assign a priority to this issue.

@jukkar jukkar added the priority: medium Medium impact/importance bug label Jan 17, 2024
@carlescufi
Copy link
Member

@bjarki-trackunit can you take a look at Robert's comment?

@bjarki-andreasen
Copy link
Collaborator

@carlescufi I'm working on a solution which makes everyone happy :) Give me a couple of hours ;)

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Jan 18, 2024

@rlubos

The PR #67731 should explain itself, but I will address your comment here for clarity :)

For me the PPP driver does not seem to be the problem here (if the PPP L2 tells the driver to stop it's operation by calling stop() - what's wrong with disabling underlying UART peripheral?), but rather the fact that PPP L2 disables the driver, while still expecting from it to operate (to perform PPP teardown).

Agreed, when disabling the interface, the L2 PPP will now block until PPP is dead, before calling the stop function, making it safe to disable the UART (or power down a modems radio ;))

It's probably even more confusing, given the PPP interface management logic is scattered between enable() ...

The management logic still uses both the call to enable() and the net_mgmt callback to manage the L2 PPP instance after #67731 . The change here is that calling net_if_down() is now synchronous. We need the net mgmt callback to monitor the state of the carrier. For modems, the application will often enable the network interface before the modem has registered to a network, or is even powered on. In this state, the carrier is down. Once the modem has started and is registered to a network, it signals the carrier is ready, and the L2 PPP interface reacts to this and starts PPP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants