Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tasynchttpserver_transferencoding.nim is flaky #17456

Closed
timotheecour opened this issue Mar 22, 2021 · 4 comments · Fixed by #18052
Closed

tasynchttpserver_transferencoding.nim is flaky #17456

timotheecour opened this issue Mar 22, 2021 · 4 comments · Fixed by #18052

Comments

@timotheecour
Copy link
Member

timotheecour commented Mar 22, 2021

see https://dev.azure.com/nim-lang/255dfe86-e590-40bb-a8a2-3c0295ebdeb1/_apis/build/builds/14578/logs/83 from #17209, on osx cpp CI.

the recently added tests/stdlib/tasynchttpserver_transferencoding.nim (from #16678) is flaky (ie, there is a bug)

/cc @vabresto

Current Output

on: hostOS: macosx, hostCPU: amd64, int: 8, float: 8, cpuEndian: littleEndian, cwd: /Users/runner/work/1/s

2021-03-22T10:55:48.1058030Z megatest:processing: tests/statictypes/tstackmatrix.nim
2021-03-22T10:55:48.1058430Z (M: 3, N: 3, fp: ...)
2021-03-22T10:55:48.1058830Z megatest:processing: tests/stdlib/concurrency/tatomics.nim
2021-03-22T10:55:48.1059270Z megatest:processing: tests/stdlib/t14139.nim
2021-03-22T10:55:48.1059680Z megatest:processing: tests/stdlib/t7686.nim
2021-03-22T10:55:48.1060090Z megatest:processing: tests/stdlib/t9091.nim
2021-03-22T10:55:48.1060430Z test AObj
2021-03-22T10:55:48.1060810Z megatest:processing: tests/stdlib/tasynchttpserver_transferencoding.nim
2021-03-22T10:55:48.1061510Z tasynchttpserver_transferencoding.nim(48) tasynchttpserver_transferencoding
2021-03-22T10:55:48.1062010Z assertions.nim(68)       failedAssertImpl
2021-03-22T10:55:48.1062380Z assertions.nim(58)       raiseAssert
2021-03-22T10:55:48.1062740Z fatal.nim(53)            sysFatal
2021-03-22T10:55:48.1063250Z Error: unhandled exception: tasynchttpserver_transferencoding.nim(48, 10) `sanity`gensym0`  [AssertionDefect]
2021-03-22T10:55:48.1063580Z 
2021-03-22T10:55:48.1063880Z FAIL: megatest execution failed
2021-03-22T10:56:39.0593650Z �[32mPASS: �[36mlib/pure/algorithm.nim cpp                                  �[34m ( 0.86 sec)�[0m

Additional Information

1.5.1 c8dda86

links

originally reported in timotheecour#544
as noted there:

this means that handler was not actually called in this case.
potential causes:

@vabresto
Copy link
Contributor

I think this is the second or third time I've heard someone run into an issue with those tests, so clearly they're flaky - most likely a rare race condition. I haven't used Nim's async much since writing them, so I don't know what the best way to resolve is, and I don't really have the bandwidth to look into it.

Does it make sense to just remove the tests? Or run them as an extended test set?

@vabresto
Copy link
Contributor

vabresto commented May 5, 2021

@timotheecour I've got some bandwidth to take a look at this issue now if you have any guidance/suggestions on what kind of solution you'd like to see or any exploratory work that might be useful. Thanks!

@timotheecour
Copy link
Member Author

timotheecour commented May 6, 2021

ya this keeps happening, and happens more lately for some reason, it's one of the most flaky tests.

Before talking about a solution, a good next step would be:

  • identifying on which platforms this happens (from disable tmacro_highlight on i386, refs #17945  #17949, at least happens on osx)
  • writing a PR which modifies the code so that the problem happens more reliably (ideally 100%), and ideally so that it can be reproduced locally instead of in CI; that way it should be easier to investigate and ultimately fix

as for the fix, it probably involves replacing callSoon with something else.

Note that 1 sure way to fix this would be to start the server in another process (not hard to do), but we should still find a non-buggy way to start the server in the same process; but maybe this (using 2 processes instead of 1) would be a good temporary solution to fix the test

@vabresto
Copy link
Contributor

vabresto commented May 6, 2021

Going to use this thread to document findings; maybe someone more knowledgeable than I am can help out if I'm missing something.

First observation is that commenting out waitFor sleepAsync(10) causes a 100% test failure rate. I tested locally on OSX (running this file 10 times), and with a sleep of both 1 and 10 I had 0 failures. I'm not sure if the conclusion is that the sleep duration is the issue, but I think without it there's no way to make the thread yield so the server can process the request.

Next, I tried to re-write the test case based on what tasynchttpserver.nim is doing, it looks like this:

import httpclient, asynchttpserver, asyncdispatch, asyncfutures
import net

import std/nativesockets

proc genTest(input, expected: string): Future[(string, bool)] {.async.} =
  var sanity = false
  proc handler(request: Request) {.async.} =
      doAssert(request.body == expected)
      doAssert(request.headers.hasKey("Transfer-Encoding"))
      doAssert(not request.headers.hasKey("Content-Length"))
      sanity = true
      await request.respond(Http200, "Good")

  proc request(server: AsyncHttpServer): Future[AsyncResponse] {.async.} =
    echo "hit 3a"
    let client = newAsyncHttpClient()
    echo "hit 3b"
    let  clientResponse = await client.request("http://localhost:64123/")
    echo "hit 3c"
    server.close()
    echo "hit 3d"
    return clientResponse

  echo "hit 1"

  let server = newAsyncHttpServer()

  echo "hit 2"

  discard server.serve(Port(64123), handler)

  echo "hit 3"

  let response = waitFor(request(server))

  echo "hit 4"

  let  body = waitFor(response.body)

  echo "hit 5"
    
  return (body, sanity)

block:
  echo "test"
  const expected = "hello=world"
  const input = ("b\r\n" &
                 "hello=world\r\n" &
                 "0\r\n" &
                 "\r\n")
  let res = genTest(input, expected)
  res.addCallback(proc(f: Future[(string, bool)]) = (
    let (body, sanity) = f.read
    echo "body: ", body, " sanity: ", sanity
  ))

when running, I get the following output:

test
hit 1
hit 2
hit 3
hit 3a
hit 3b

There are no errors or other indication of why the program stopped, and I'm kind of stumped. waitFor doesn't do anything complicated, so I don't understand what might cause this to happen. Is the server being closed somehow causing the future to fail, but no error is shown? Not entirely sure how to progress from here ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants