-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP: Use RunOnTCPIP() for the LwIP TCP EndPoint #36962
base: master
Are you sure you want to change the base?
Conversation
PR #36962: Size comparison from 1b4c56c to 5256657 Increases above 0.2%:
Full report (14 builds for cc13x4_26x4, cc32xx, nrfconnect, qpg, stm32, tizen)
|
PR #36962: Size comparison from 1b4c56c to 267b09d Increases above 0.2%:
Full report (69 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
@wqx6 please fix build errors and add a |
PR #36962: Size comparison from f8d457a to cca01f2 Increases above 0.2%:
Full report (69 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
src/inet/TCPEndPointImplLwIP.cpp
Outdated
System::Layer & lSystemLayer = listenEP->GetSystemLayer(); | ||
TCPEndPointImplLwIP * listenEP = static_cast<TCPEndPointImplLwIP *>(arg); | ||
TCPEndPointImplLwIP * conEP = nullptr; | ||
System::LayerFreeRTOS & lSystemLayer = static_cast<System::LayerFreeRTOS &>(listenEP->GetSystemLayer()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this cast OK? Is it really the case that any time we are using LwIP we are using LayerFreeRTOS?
It really seems like the right thing to do is for any work that needs to happen on the Matter queue to actually happen on the Matter queue, async. If we need to snapshot some data for that, we snapshot that data....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a declared argument in system.gni
chip_system_config_use_lwip = chip_with_lwip && current_os == "freertos"
And we have no such scenario on which we are using LwIP for other OSs. This cast seems OK for now.
It really seems like the right thing to do is for any work that needs to happen on the Matter queue to actually happen on the Matter queue, async. If we need to snapshot some data for that, we snapshot that data....
Yes, I have tried to use ScheduleLamba
to post the EndPoint allocating to Matter queue, but ScheduleLamba
is a async function and we have to use the allocated endpoint immediately after the ScheduleLamba
. Also I have tried to post all the following actions in the LwIPHandleIncomingConnection
to Matter queue but encountered another issue. --- If we use ScheduleLamba
for the whole LwIPHandleIncomingConnection
, the function will ends immediately and the LwIP will finish three-way handshake and start receives TCP packets, the peer will also start sending TCP packets, but the arg and recv function is not set at that time(as the set process is post to Matter queue). This will result the timeout for TCP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved the function back to System::Layer
but added a macro for the RunWithMatterContextLock()
function so that it will only be available when the system has lock. And also change here to System::Layer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a declared argument in system.gni
Just for future reference: that's an argument with a default value. Builds can set the argument to a non-default value; that's why it's an argument. For example, config/openiotsdk/chip-gn/args.gni
does that. Or you can do it via command-line.
It sounds like there are bigger architectural problems here, in that lwip is delivering notifications on some random thread but expecting them to do various work synchronously that we just can't easily do synchronously... For now I guess the locking is OK, but we should make the failure modes clear... I'll comment on the PR with details.
PR #36962: Size comparison from f8d457a to a331147 Increases above 0.2%:
Full report (69 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
@wqx6 do we know why this costs 2K of flash? that is a lot |
src/system/SystemPacketBuffer.cpp
Outdated
#endif | ||
// Set ther length and total length of the head packet buffer and all the chained packet buffers to 0 | ||
// as we don't put any data in them. And set the ref to 1 for all the buffers. | ||
for (PacketBuffer * pktBuf = lPacket; pktBuf != nullptr; pktBuf = pktBuf->ChainedBuffer()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So... I am quite sure that various callers of PacketBufferHandle::New
expect to get back a non-chained buffer and will fail if that is not the case. Other failure modes just in this function, if for some reason we have a chained buffer: what if aReservedSize is larger than the size of the head buffer? Then we have a corrupt lPacket->payload
...
I would suggest splitting this piece out of the TCP changes, since it's not directly related, so we can figure out what should actually happen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the platforms which don't use lwip, the allocated packet buffer will never have a chained buffer, but for platforms using LwIP, the pbuf_alloc()
will allocate pbuf with chained pbufs if the allocated size is too large, and in our current code the chained pbufs will never be released which causes memory leak.
Agreed that we should split the change, reverted it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is not that we don't need this change, but rather that we probably need a complete rethink of how we use packetbuffers if New
can ever create a chained buffer. It will break crypto code, it will break all sorts of stuff....
Thank you for pulling this part out.
PR #36962: Size comparison from 3f62505 to 637fa24 Increases above 0.2%:
Full report (65 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, psoc6, qpg, stm32, telink, tizen)
|
src/platform/PlatformLockSupport.cpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having the header separated from the CPP is not ok - please make sure they are joined.
If you have trouble due to dependencies, that is a layering issue that has to be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is an issue of dependency cycle. The core
depends on system
and the platform
depends on the core
so if we reside the cpp with the header, it will call functions from platform
lib but be used in system
lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also separated the PlatformEventSupport.h and PlatformEventSupport.cpp, should we also reside them together?
I have trouble on residing them, could you please provide a solution for it?
src/platform/PlatformLockSupport.cpp
Outdated
|
||
using namespace ::chip::DeviceLayer; | ||
|
||
#if !CHIP_SYSTEM_CONFIG_NO_LOCKING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make these decisions in build.gn: include the CPP file only if platform locking exists.
Depending on usage, we may also consider creating a separate target of locking-not-available-do-not-use
(or some other name) that contains these files in case of locking not available, so that GN fails to complie if people try to use locking when no locking available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the macro in this file and this file will be include when chip_system_config_locking != "none"
in gn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request changes on header and cpp file needing to reside together.
src/platform/PlatformLockSupport.cpp
Outdated
/* this file behaves like a config.h, comes first */ | ||
#include <platform/internal/CHIPDeviceLayerInternal.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this got copy/pasted, but is that include needed in any way here? I would think PlatformManager.h and PlatformLockSupport.h include whatever headers they depend on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NXP(FreeRTOS) build CI fails if removing this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the failure, exactly?
Co-authored-by: Boris Zbarsky <[email protected]>
This reverts commit 99f823c.
src/system/PlatformLockSupport.h
Outdated
#pragma once | ||
|
||
/** | ||
* The declared functions in this header are defined in src/platform/PlatformLockSupport.cpp because of dependency cycle issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not good. Layering issue has to be fixed.
This header does not have any dependencies that I can see, so why is it not in platform then?
We need to figure out what depends on what else properly.
@andy31415 @bzbarsky-apple I made the following change which doesn't need to add the
Could you please review it? |
PR #36962: Size comparison from ce47e22 to d5069a3 Increases above 0.2%:
Full report (72 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
Changes
RunOnTCPIP()
inTCPEndPointImplLwIP
instead ofLOCK_TCPIP_CORE()/UNLOCK_TCPIP_CORE()
.NewEndPoint()
which is required to be called in Matter context. And the new endpoint will be used immediately after it is created. so we cannot useScheduleLambda()
. This PR creates a new functionRunOnMatterContext()
for it.Testing
chip_inet_config_enable_tcp_endpoint
.