Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in USB CDC transmit #1945

Closed
wants to merge 1 commit into from

Conversation

pbrook
Copy link
Contributor

@pbrook pbrook commented Mar 21, 2014

This patch fixes a race that can cause data corruption/loss when sending data over the USB CDC serial port. Exact details in commit message.

The window for the race condition is quite small, and the CDC code very slow (byte writes), so the hardware double buffering means this occurs extremely rarely for most sketches.

I have however seen it occur in real life.

If the Start of Frame interrupt triggers just after the call
to USB_SendSpace in USB_Send then we can get data loss.
When the first bank is full and the second partially full,
the SOF handler will release the second bank via USB_Flush.
Data is then lost due to overflow as USB_Send continues writing data
to the now-closed bank.

Fix this by re-checking the FIFO status inside LockEP, immediately before
doing the data write.

Signed-off-by: Paul Brook <[email protected]>
@embmicro
Copy link
Contributor

This bug was causing a lot of problems for me. I was using the Arduino as a high speed USB->Serial converter and it was dropping a lot of data. It now seems to be working great.

@jantje
Copy link

jantje commented Mar 22, 2014

I cant get my yun leonardo to talk reliable over USB to my yun linino on serial speeds above 9500.
Big bulks of data fail nearly 100% for sure. Could that be related?

@pbrook
Copy link
Contributor Author

pbrook commented Mar 24, 2014

The Yun problems could be related, though IIUC that uses the UARTs (not USB) to connect the two chips, without any flow control. If so that's your most likely cause of data loss.

It's worth noting that even if overall throughput is sufficient, USB busses tend to experience high latency and intermittent stalls. This and high interrupt latency in the CDC RX path can induce UART buffer overflows much sooner than you might expect.

@jantje
Copy link

jantje commented Mar 24, 2014

Thanks for the answer.
I'm using both the bridge (UART if I understood correctly) and serial (USB) to communicate between leonardo and lininio. So I guess your saying I'm likely to suffer this problem.

@cmaglie
Copy link
Member

cmaglie commented Mar 25, 2014

Thanks for the patch!
May you provide a test sketch that trigger the bug? Is it reproducible?
C

@pbrook
Copy link
Contributor Author

pbrook commented Mar 26, 2014

Unfortunately it's quite hard to reproduce with the current code. The host PC usually receives and processes the data much faster than we can send it (and we cannot fill a 64-byte packet in a single frame), so the FIFO never gets more than half full. Triggering the bug requires careful control of (or very bad luck with!) the host USB stack.

I can reproduce consistently with one of the following changes:
either (a) Replace EP_DOUBLE_64 with EP_SINGLE_64 in USBCore.cpp:InitEndpoints to disable double buffering
or (b) Apply the block write patch: #1941

The following code then exhibits somewhere between 0.1% and 0.3% data loss. i.e. a missing character every few lines.

void setup() {
  Serial.begin(9600); // CDC virtual port, baud rate ignored
  while(1) {
    char i;
    // Block write a 64-byte USB packet of data
    for (i = 0; i < 8; i++) Serial.print("12345678");
    // Partially fill the next packet a byte at a time
    for (i = 0; i < 26; i++) Serial.write('a'+i);
    Serial.write('\n');
    // Wait a frame and a bit. Should be sufficient for the USB FIFO
    // to drain, and SOF to occur at a slightly different point
    delayMicroseconds(1000+random(100));
  }
}

@cmaglie
Copy link
Member

cmaglie commented Apr 1, 2014

Thanks, I've checked all the patches #1941 #1945 #1953 and together they works great!
After applying #1941 I was able to consistently reproduce the race-condition with the attached sketch, and I can confirm that is fixed by #1945.

I'll keep the three pull request still open for a short while for some more testing, but I'm confident that they are ready to be merged.

void burstSend() {
  // Send a burst of 1.000.000 bytes of data
  for (uint16_t i=0; i<20000; i++) {
    // 25 bytes
    Serial.print("0123456789abcdeABCDE=)(/&");
    // 25 bytes
    Serial.print("qwertyuiopasdfghjklzQWERT");
  }
}

C

@cmaglie
Copy link
Member

cmaglie commented May 30, 2014

Manually merged here:
13c0db5

Thank you!

@cmaglie cmaglie closed this May 30, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants