-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20 #3841
Conversation
Overall package sizeSelf size: 5.62 MB Dependency sizes
🤖 This report was automatically generated by heaviest-objects-in-the-universe |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #3841 +/- ##
===========================================
- Coverage 85.06% 71.75% -13.32%
===========================================
Files 228 226 -2
Lines 9363 9351 -12
Branches 33 33
===========================================
- Hits 7965 6710 -1255
- Misses 1398 2641 +1243 ☔ View full report in Codecov by Sentry. |
BenchmarksBenchmark execution time: 2023-12-01 19:25:18 Comparing candidate commit f9ae615 in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 517 metrics, 15 unstable metrics. |
|
||
app.get('/user', async (req, res) => { | ||
await new Promise(resolve => { | ||
setTimeout(resolve, 6 * 1000) // over 5s default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you search the codebase for usage of clock.tick()
and try to use it here?
In the current state these tests will add 6s of time, but clock.tick might be able to expedite that.
I know it works with setTimeout. I don't know if it works with the inner socket timeout stuff (like, is that handled in C?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried changing it to
app.get('/user', async (req, res) => {
clock.tick(6000)
res.status(200).send()
})
For the test I added that checks custom agent timeouts still attach errors, and it failed because the test timed out without ever passing (so the error wasn't attached) - I'm guessing the inner socket timeout clocks don't play nicely with this? It looks like the timeout event didn't trigger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does something like this work:
app.get('/user', (req, res) => {
setTimeout(()=>{res.status(200).send()}, 6 * 1000)
clock.tick(6000)
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still ends up failing with that too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, looks like it only affects JS timers and Date:
https://github.com/sinonjs/fake-timers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's no real way around it then...
I suspect these 3 tests for each of 2 modules (http, https) running at 6s will increase test run time by ~36s.
|
||
ctx.req = req | ||
|
||
// tracked to accurately discern custom request socket timeout | ||
let customRequestTimeout = false | ||
req.setTimeout = function () { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the timeout is set directly on the underlying socket?
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
#3841) [core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
// conditions for no error: | ||
// 1. not using a custom agent instance with custom timeout specified | ||
// 2. no invocation of `req.setTimeout` | ||
if (!args.options.agent?.options.timeout && !customRequestTimeout) return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this crashes the whole process if args.options.agent?.options
is undefined with error :
TypeError: Cannot read properties of undefined (reading 'timeout')
at HttpClientPlugin.error (/app/node_modules/dd-trace/packages/datadog-plugin-http/src/client.js:125:39)
at /app/node_modules/dd-trace/packages/dd-trace/src/plugins/tracing.js:73:22
at Subscription._handler (/app/node_modules/dd-trace/packages/dd-trace/src/plugins/plugin.js:14:9)
at Channel.publish (node:diagnostics_channel:141:9)
at req.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/http/client.js:101:30)
at Socket.emitRequestTimeout (node:_http_client:847:9)
at Object.onceWrapper (node:events:631:28)
at Socket.emit (node:events:529:35)
at Socket.emit (node:domain:552:15)
at Socket.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25)
at Socket._onTimeout (node:net:598:8)
at listOnTimeout (node:internal/timers:569:17)
at process.processTimers (node:internal/timers:512:7)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is an open github issue as well.
#3939
We are facing issue of CrashLoopBackOff very often due to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this PR do?
Makes sure timeouts recorded by the global agent above Node 20 that exceed the new 5s threshold are not reported as errors. Does this by checking three things before tagging timeouts as errors:
options.agent.options.timeout
) is not in playreq.setTimeout
) is not in playThis check was to ensure default timeouts do not happen with versions before. However, since there hasn't been a default timeout (http.globalAgent.options.timeout
is not setInfinity
) since Node 13, and the current supported tracer versions only support above Node 14, this check is not necessary.Motivation
Fixes #3515
Node 20 introduced a change to the global agent that changed its timeout to 5s (PR). On our end, this changed how errors are attached on timeouts, since supported Node versions for the tracer prior to Node 20 had no default timeout, any HTTP request above 5s that did not have a custom timeout specified would not be marked as erroneous. However, that behavior has changed with Node 20, and successful HTTP calls over 5s are marked as errors because they timeout (even with
keepAlive: true
). This PR changes that by checking the conditions mentioned above. Now, errors are not recorded if the timeout happens when there is no custom agent with a custom timeout, or ifreq.setTimeout
hasn't been invoked. If either have been, the timeout is still recorded as an error.For
req.setTimeout
- it is "wrapped" only for the sake of seeing if it is invoked. Otherwise, it is difficult to discern ifreq.socket.timeout
was set by the global agent or by the user. If set by the user, any timeout errors should be recorded. If set by the global agent, they should not. If the user hasreq.setTimeout(5000)
, thenreq.socket.timeout === 5000
. However, if they don't set it, thenreq.socket.timeout === 5000
regardless.Security
Datadog employees:
@DataDog/security-design-and-guidance
.Unsure? Have a question? Request a review!