Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: l2: ppp: ppp uart usage fixes #61147

Merged
merged 1 commit into from
Aug 9, 2023

Conversation

jhirsi
Copy link
Contributor

@jhirsi jhirsi commented Aug 4, 2023

This fixes 3 issues that came within PR #59124 for ppp uart usage.

Earlier start/stop of ppp was done at enable() but that was removed in PR #59124.
Now putting enable/disable() back and putting start/stop there.

Additionally, there was a double ppp carrier ON when NET_EVENT_IF_DOWN. For that net_if_carrier_on/off is set in uart ppp.c driver. Also, worth to be mentioned that after PR #59124 there is no ppp carrier off when lcp is disconnected, for workaround that change, application that needs that information: should use ppp dead/running events instead.

rlubos
rlubos previously approved these changes Aug 4, 2023
Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, if you are using ppp.c, then you need to start it manually in your driver, before you call net_if_carrier_on() on the network interface, (or net_if_up() as is done from within the gsm_ppp.c driver). If you need an event when LCP has established the data link, add the NET_EVENT_PPP_PHASE_RUNNING and NET_EVENT_PPP_PHASE_DEAD events where you have moved the CARRIER_ON/OFF events.

This should solve your issues without adding complexity by mixing NET_PPP and NET_L2_PPP or changing the meaning of the CARRIER_ON/OFF events :)

@bjarki-andreasen bjarki-andreasen added the dev-review To be discussed in dev-review meeting label Aug 4, 2023
This fixes 3 issues that came within PR zephyrproject-rtos#59124 for ppp uart usage.

Earlier start/stop of ppp was done at enable() but that
was removed in PR zephyrproject-rtos#59124. Now putting enable/disable() back and
putting start/stop there.
Additionally, there was a double ppp carrier ON when NET_EVENT_IF_DOWN.
For that net_if_carrier_on/off is set in uart ppp.c driver.
Also, maybe worth to be mentioned that after PR zephyrproject-rtos#59124 there is no
ppp carrier off when lcp is disconnected, for workaround that change,
application should use ppp dead/running events.

Signed-off-by: Jani Hirsimäki <[email protected]>
Copy link
Contributor

@rlubos rlubos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jhirsi for the time you put into investigating this.

@bjarki-trackunit Please revisit, it seems now that the changes should be not be affecting your use case, while making net/ppp.c driver usable again.

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Aug 8, 2023

The fundamental issue with this PR is that it is trying to make the L2 PPP subsystem adhere to some downstream code, instead of updating the downstream code to adhere to the L2 PPP subsystem.

ppp.c is not part of the L2 PPP subsystem, it is simply one complex partial driver which depends on some other device driver to actually set up the carrier. Calling start() and stop() is not the responsibility of the L2 PPP subsystem, hence why it no longer does.

Additionally, the carrier is not LCP, when LCP is dead, it does not mean the carrier is lost. If some downstream code is misusing the event, that again, should be fixed there, not altered in the L2 PPP subsystem which "accurately" fires the event (the event is not actually indicating the carrier status, it was just kept there for backwards compatibility).

The changes in this PR simply do not adhere to the documentation and functionality of a L2 network interface and subsystem...

If you could instead create an issue, where you accurately document how you are using the L2 PPP subsystem and which issues you are seeing, showing some logs, code examples etc. We may be able figure out how to align your code with the L2 PPP subsystem :)

@rlubos
Copy link
Contributor

rlubos commented Aug 8, 2023

I'm sorry, but I almost totally disagree with your statement.

The fundamental issue with this PR is that it is trying to make the L2 PPP subsystem adhere to some downstream code, instead of updating the downstream code to adhere to the L2 PPP subsystem.

It's not "some downstream" code we want to adhere to, we want the PPP L2 to make use of the APIs defined in struct ppp_api as it was originally intended to do, and which was erroneously broken.

ppp.c is not part of the L2 PPP subsystem,

ppp.c implements a network device which runs PPP L2. Yes, it is not part of the PPP L2 (I don't see why would it be), but it does implement the driver API which L2 is is supposed to use to interact with the device.

it is simply one complex partial driver which depends on some other device driver to actually set up the carrier.

This is simply not correct. Before changes in #59124, Zephyr was able to establish PPP link with Windows/Linux host w/o any other action needed, other than bringing the network interface up. This is no longer the case, because from some reason it is assumed that some other module is supposed to do the work that L2's enable() used to do (and in my opinion - still should).

Calling start() and stop() is not the responsibility of the L2 PPP subsystem, hence why it no longer does.

I don't see where does it requirement come from. We have enable() implemented in various L2s, so that it can do whatever is needed (including interaction with the driver) to make the interface ready to use once it's been brought up. I don't understand why does PPP L2 would need to behave differently.

Additionally, the carrier is not LCP, when LCP is dead, it does not mean the carrier is lost. If some downstream code is misusing the event, that again, should be fixed there, not altered in the L2 PPP subsystem which "accurately" fires the event (the event is not actually indicating the carrier status, it was just kept there for backwards compatibility).

That's correct, and that's why we're no longer pushing for any changes related to how and when PPP_CARRIER events are generated. We're switching to RUNNING/DEAD events instead (although they also seem to have their own issues, like DEAD firing several times during link negotiation).

The changes in this PR simply do not adhere to the documentation and functionality of a L2 network interface and subsystem...

Can you provide us with the link to the documentation, where the behaviors that we "break" in the current proposal are described? Otherwise, it's hard to relate.

If you could instead create an issue, where you accurately document how you are using the L2 PPP subsystem and which issues you are seeing, showing some logs, code examples etc. We may be able figure out how to align your code with the L2 PPP subsystem :)

For reference: https://github.com/nrfconnect/sdk-nrf/blob/main/samples/cellular/modem_shell/src/ppp/ppp_ctrl.c
The code brings the PPP interface up, and communicated with host over raw socket. Very simple, yet no longer possible.

@bjarki-andreasen
Copy link
Collaborator

bjarki-andreasen commented Aug 8, 2023

Thank you for the reference.

The documentation is in this segment Network interface state management

To summarize; enable() is called when a network interface is administratively brought up, not when it becomes operational.

struct net_l2 {
	...

	/**
	 * This function is used to enable/disable traffic over a network
	 * interface. The function returns <0 if error and >=0 if no error.
	 */
	int (*enable)(struct net_if *iface, bool state);

	...
};

If you tie the starting and stopping of the L2 driver to enable(), it will only be enabled if net_if_up() or net_if_down() is called. However, if the carrier is brought down, say a modem looses registration to the network, neither net_if_up() or net_if_down() is called, net_if_carrier_off() is.

Once net_if_carrier_off() is called, LCP is terminated, and the L2 driver has to work to bring up the carrier again. Once the carrier is back, net_if_carrier_on() is called and the network interface becomes operational again. At this point, LCP will reinitialize and most likely fail...

Here is the issue: start() and stop() were never called, to reinitialize ppp.c, leaving it in an unknown state.

The only part of the system which knows when the carrier was acquired, lost, and reacquired is the L2 driver, which is why it is the only logical place to keep that logic.

I can not summarize the issue simpler than this. This PR provides a change which re-implements a fundamentally non-functional mechanism (while also adding net_if_carrier_on() and net_if_carrier_off() to ppp.c which results in the net_if_up() call mandating that the carrier is also on, which it can not know...)`

If we really want to keep the start and stop:
add start here

ppp_open(ctx);

and stop here
ppp_close(ctx);

you do not need to add anything in addition to this. the carrier is set to on by default, so net_if_up() and net_if_down() will change the operational state unless the carrier is manually set to off using net_if_carrier_off() at some point.

@rlubos
Copy link
Contributor

rlubos commented Aug 9, 2023

The documentation is in this segment Network interface state management

Good, I'm pretty well aware of that part of the documentation, as I wrote it.

To summarize; enable() is called when a network interface is administratively brought up, not when it becomes operational.

And that's when it's supposed to be called, there's no confusion about that. Application brings the interface up (admin up), and the L2 starts the underlying device:

struct ppp_api {
...
	/** Start the device */
	int (*start)(const struct device *dev);

	/** Stop the device */
	int (*stop)(const struct device *dev);
}

Don't you think that calling start()/stop() on the underlying network device after you enter/exit OPERATIONAL UP state on the interface, isn't logically sound? The interface should already be up and running, ready to transmit packets, yet you expect us to call the function which "Starts the device" then.

If you tie the starting and stopping of the L2 driver to enable(), it will only be enabled if net_if_up() or net_if_down() is called.

Is that wrong? If you bring the interface down, you stop the underlying device. And if you bring it up, you start it. What's wrong with that?

And it's not that you "tie" the API calls to enable() only - if your particular network device needs to be restarted in certain cases, you are still free to do so elsewhere.

Once net_if_carrier_off() is called, LCP is terminated, and the L2 driver has to work to bring up the carrier again. Once the carrier is back, net_if_carrier_on() is called and the network interface becomes operational again. At this point, LCP will reinitialize and most likely fail...

Here is the issue: start() and stop() were never called, to reinitialize ppp.c, leaving it in an unknown state.

The only part of the system which knows when the carrier was acquired, lost, and reacquired is the L2 driver, which is why it is the only logical place to keep that logic.

So if your particular modem driver needs to reboot the modem, do it, keep the logic there. But ppp.c is not a driver that has any modem awareness, nor a driver that needs reinitialization when link is lost (start() merely configures the UART). And most of all, the driver is generic, it allows to communicate between any two devices over PPP, not necessarily a modem.

This PR provides a change which re-implements a fundamentally non-functional mechanism

Sorry, but even after reading all your comments here and in your previous PR that introduced the regression, I still don't understand what's non-fucntional about enable() in terms of PPP L2.

(while also adding net_if_carrier_on() and net_if_carrier_off() to ppp.c which results in the net_if_up() call mandating that the carrier is also on, which it can not know...)`

I think we need to reconsider what a carrier means in terms of ppp.c. Yes, the driver is generic and it doesn't know that kind of device will be on the other end of the link. But the PHY (UART) is ready to transmit packets as soon as the UART driver is configured, and that's what carrier status should represent. So I see nothing wrong with the current change (in terms of ppp.c only).

@rlubos rlubos requested a review from jukkar August 9, 2023 07:48
@rlubos
Copy link
Contributor

rlubos commented Aug 9, 2023

Adding @jukkar to the discussion, as the original author of the PPP subsys.

Copy link
Member

@jukkar jukkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@bjarki-andreasen bjarki-andreasen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last retort, the UART being initialized is not the carrier being ready, what if the UART is not physically connected to anything?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Networking dev-review To be discussed in dev-review meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants