Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed : retransmission of discarded segments starts at beginning of new block #546

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

samsamfire
Copy link
Contributor

@samsamfire samsamfire commented Nov 6, 2024

Hi,

I noticed this issue whilst working on another canopen package.
The block upload retransmit does not work correctly.
On the event that a client does not properly receive a sub-block, it sends an end sub-block message with the last acknowledged segment number.
All the frames between ackseq and blksize sent by the server should be ignored (this is currently the case).
However, the server will start resending the missed frames (between ackseq and blksize) at the beginning of the new block, so at seqno==1.
This is difficult to test within the library as there is no sdo server supporting block transfer, but I have tested it against another implementation and it works OK.

@codecov-commenter
Copy link

codecov-commenter commented Nov 6, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.84%. Comparing base (ffbd10f) to head (ab4d150).
Report is 1 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #546      +/-   ##
==========================================
+ Coverage   71.36%   71.84%   +0.47%     
==========================================
  Files          26       26              
  Lines        3129     3129              
  Branches      480      480              
==========================================
+ Hits         2233     2248      +15     
+ Misses        765      752      -13     
+ Partials      131      129       -2     
Files with missing lines Coverage Δ
canopen/sdo/client.py 75.27% <100.00%> (+3.25%) ⬆️

@samsamfire
Copy link
Contributor Author

Hello,

What are your thoughts ?

@acolomb
Copy link
Collaborator

acolomb commented Nov 26, 2024

Sorry, didn't find the time to look at it yet. Will do soon when possible.

@acolomb
Copy link
Collaborator

acolomb commented Jan 13, 2025

I've tried to wrap my head around this part of the standard, but I really cannot judge from just reading it whether this is more correct. Sorry, I have very limited hands-on experience with SDO block transfers, so not easy to see what's going on. So I'm hesitant about merging the change without further understanding what it actually fixes.

Could you maybe try to record a bus log which triggers this condition, from a correctly behaving client? Then we could add that as an expected message exchange in the test_sdo.py file and validate in a test case. There are lots of examples there which do validate the generated CAN objects, so this should be well testable.

@samsamfire
Copy link
Contributor Author

samsamfire commented Jan 13, 2025

Hi,

Completely agree that a test should be added, the problem is that we currently don't have an sdo server supporting block transfer. Also I don't think we could see this just with the CAN frames because the protocol part is correct, what's happening is that some frames are getting ignored on the client side, which results in a wrong CRC at the end of the transfer.

Let me try to re-phrase what the current problem is and add an example.
When doing an SDO block upload, the server sends blocks of data, each with a predefined size. For this example i'll take 127 which is the maximum block size.
The block is composed of frames which all start with a sequence number going from 1 to 127. The client expects 127 frames to arrive in order to validate the block, and expects them to arrive in the correct order 1-127.
If for some reason a block segment is lost or arrives in an incorrect order, the client will acknowledge the last "good" segment.
So if a client sees segments starting with :
1,2,3,4,6,5,7,8 it will reply as soon as it detects the problem with the last good segment which is 4.
This means that the SDO server should retransmit all the un-acknowledged segments starting with segment 5.
Failure to do so will result in an incorrect CRC at the end of the block transfer.

The current implementation of retransmit looks like this :

    def _retransmit(self):
        logger.info("Only %d sequences were received. Requesting retransmission",
                    self._ackseq)
        end_time = time.time() + self.sdo_client.RESPONSE_TIMEOUT
        self._ack_block()
        while time.time() < end_time:
            response = self.sdo_client.read_response()
            res_command, = struct.unpack_from("B", response)
            seqno = res_command & 0x7F
            if seqno == self._ackseq + 1:
                # We should be back in sync
                self._ackseq = seqno
                return response
        self._error = True
        self.sdo_client.abort(0x05040000)
        raise SdoCommunicationError("Some data were lost and could not be retransmitted")

We are waiting for the sequence number to be the same as the last good known sequence number, to start considering the messages. However, this is wrong because the SDO server will start sending the discarded segments at the start of a new block.
Simplified example of what is happening :

SERVER

[TX] 1...
[TX] 2...
[TX] 3...
[TX] 4...
[TX] 6... ==> Wrong seqno received (can be client or server's fault)
[TX] 5...
... Can continue sending rest of block
CLIENT

[TX] 4... ==> Last good segment is 4


SERVER

[TX] 1... ==> This corresponds to data of seqno "5" of previous block
[TX] 2...
[TX] 3...
[TX] 4... ==> This is where the current implementation considers to be back in sync, which is wrong.
...
[TX] 127...
CLIENT

[TX] 127 ==> Complete block received successfully

I hope this makes things clearer.

@samsamfire samsamfire force-pushed the sdo-block-upload-retransmit-fix branch from 4725972 to 5341142 Compare January 15, 2025 10:41
…entation but will fail with an invalid CRC without fix for discarded segments.
@samsamfire samsamfire force-pushed the sdo-block-upload-retransmit-fix branch from 5341142 to 33aa620 Compare January 15, 2025 10:52
@samsamfire
Copy link
Contributor Author

Hello,

I've added a test for SDO block transfer retransmit, this took me a bit of time.
This test passes with the current fix, but will fail with an invalid CRC with the current implementation, because some blocks are ignored by client when they shouldn't be, as discussed previously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants