Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20 #3841

Merged
merged 4 commits into from
Dec 1, 2023

Conversation

sabrenner
Copy link
Collaborator

@sabrenner sabrenner commented Dec 1, 2023

What does this PR do?

Makes sure timeouts recorded by the global agent above Node 20 that exceed the new 5s threshold are not reported as errors. Does this by checking three things before tagging timeouts as errors:

  1. A custom agent with a custom timeout (options.agent.options.timeout) is not in play
  2. A custom request timeout (via req.setTimeout) is not in play
  3. http.globalAgent.options.timeout is not set This check was to ensure default timeouts do not happen with versions before. However, since there hasn't been a default timeout (Infinity) since Node 13, and the current supported tracer versions only support above Node 14, this check is not necessary.

Motivation

Fixes #3515

Node 20 introduced a change to the global agent that changed its timeout to 5s (PR). On our end, this changed how errors are attached on timeouts, since supported Node versions for the tracer prior to Node 20 had no default timeout, any HTTP request above 5s that did not have a custom timeout specified would not be marked as erroneous. However, that behavior has changed with Node 20, and successful HTTP calls over 5s are marked as errors because they timeout (even with keepAlive: true). This PR changes that by checking the conditions mentioned above. Now, errors are not recorded if the timeout happens when there is no custom agent with a custom timeout, or if req.setTimeout hasn't been invoked. If either have been, the timeout is still recorded as an error.

For req.setTimeout - it is "wrapped" only for the sake of seeing if it is invoked. Otherwise, it is difficult to discern if req.socket.timeout was set by the global agent or by the user. If set by the user, any timeout errors should be recorded. If set by the global agent, they should not. If the user has req.setTimeout(5000), then req.socket.timeout === 5000. However, if they don't set it, then req.socket.timeout === 5000 regardless.

Security

Datadog employees:

  • If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from @DataDog/security-design-and-guidance.
  • This PR doesn't touch any of that.

Unsure? Have a question? Request a review!

Copy link

github-actions bot commented Dec 1, 2023

Overall package size

Self size: 5.62 MB
Deduped: 61.61 MB
No deduping: 62.37 MB

Dependency sizes

name version self size total size
@datadog/native-iast-taint-tracking 1.6.4 16.43 MB 16.44 MB
@datadog/native-appsec 5.0.0 15.16 MB 15.17 MB
@datadog/pprof 4.0.1 9.32 MB 10.16 MB
protobufjs 7.2.4 2.74 MB 6.52 MB
@datadog/native-iast-rewriter 2.2.1 2.27 MB 2.36 MB
@opentelemetry/core 1.14.0 872.87 kB 1.47 MB
@datadog/native-metrics 2.0.0 898.77 kB 1.3 MB
@opentelemetry/api 1.4.1 780.32 kB 780.32 kB
import-in-the-middle 1.4.2 41.4 kB 704.79 kB
pprof-format 2.0.7 588.12 kB 588.12 kB
msgpack-lite 0.1.26 201.16 kB 281.59 kB
opentracing 0.14.7 194.81 kB 194.81 kB
semver 7.5.4 93.4 kB 123.8 kB
@datadog/sketches-js 2.1.0 109.9 kB 109.9 kB
lodash.sortby 4.7.0 75.76 kB 75.76 kB
lru-cache 7.14.0 74.95 kB 74.95 kB
ipaddr.js 2.1.0 60.23 kB 60.23 kB
ignore 5.2.4 51.22 kB 51.22 kB
int64-buffer 0.1.10 49.18 kB 49.18 kB
istanbul-lib-coverage 3.2.0 29.34 kB 29.34 kB
lodash.uniq 4.5.0 25.01 kB 25.01 kB
limiter 1.1.5 23.17 kB 23.17 kB
dc-polyfill 0.1.2 22.77 kB 22.77 kB
retry 0.13.1 18.85 kB 18.85 kB
lodash.kebabcase 4.1.1 17.75 kB 17.75 kB
node-abort-controller 3.1.1 16.89 kB 16.89 kB
lodash.pick 4.4.0 16.33 kB 16.33 kB
jest-docblock 29.7.0 8.99 kB 12.76 kB
crypto-randomuuid 1.0.0 11.18 kB 11.18 kB
path-to-regexp 0.1.7 6.78 kB 6.78 kB
koalas 1.0.2 6.47 kB 6.47 kB
methods 1.1.2 5.29 kB 5.29 kB
module-details-from-path 1.0.3 4.47 kB 4.47 kB

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (cdebb59) 85.06% compared to head (f9ae615) 71.75%.
Report is 5 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #3841       +/-   ##
===========================================
- Coverage   85.06%   71.75%   -13.32%     
===========================================
  Files         228      226        -2     
  Lines        9363     9351       -12     
  Branches       33       33               
===========================================
- Hits         7965     6710     -1255     
- Misses       1398     2641     +1243     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pr-commenter
Copy link

pr-commenter bot commented Dec 1, 2023

Benchmarks

Benchmark execution time: 2023-12-01 19:25:18

Comparing candidate commit f9ae615 in PR branch sabrenner/http-default-timeout with baseline commit cdebb59 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 517 metrics, 15 unstable metrics.

@sabrenner sabrenner marked this pull request as ready for review December 1, 2023 19:30
@sabrenner sabrenner requested review from a team as code owners December 1, 2023 19:30
@sabrenner sabrenner requested a review from jbertran December 1, 2023 19:30

app.get('/user', async (req, res) => {
await new Promise(resolve => {
setTimeout(resolve, 6 * 1000) // over 5s default
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you search the codebase for usage of clock.tick() and try to use it here?

In the current state these tests will add 6s of time, but clock.tick might be able to expedite that.

I know it works with setTimeout. I don't know if it works with the inner socket timeout stuff (like, is that handled in C?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried changing it to

app.get('/user', async (req, res) => {
  clock.tick(6000)
  res.status(200).send()
})

For the test I added that checks custom agent timeouts still attach errors, and it failed because the test timed out without ever passing (so the error wasn't attached) - I'm guessing the inner socket timeout clocks don't play nicely with this? It looks like the timeout event didn't trigger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does something like this work:

app.get('/user', (req, res) => {
  setTimeout(()=>{res.status(200).send()}, 6 * 1000)
  clock.tick(6000)
})

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still ends up failing with that too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, looks like it only affects JS timers and Date:
https://github.com/sinonjs/fake-timers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's no real way around it then...

I suspect these 3 tests for each of 2 modules (http, https) running at 6s will increase test run time by ~36s.

@sabrenner sabrenner merged commit 2933961 into master Dec 1, 2023
@sabrenner sabrenner deleted the sabrenner/http-default-timeout branch December 1, 2023 20:41

ctx.req = req

// tracked to accurately discern custom request socket timeout
let customRequestTimeout = false
req.setTimeout = function () {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the timeout is set directly on the underlying socket?

khanayan123 pushed a commit that referenced this pull request Dec 12, 2023
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Dec 13, 2023
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Dec 14, 2023
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Dec 14, 2023
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Jan 2, 2024
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Jan 2, 2024
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Jan 2, 2024
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Jan 2, 2024
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Jan 2, 2024
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
khanayan123 pushed a commit that referenced this pull request Jan 2, 2024
#3841)

[core] Do Not Report HTTP Requests Over 5 Seconds as Errors on Node 20
// conditions for no error:
// 1. not using a custom agent instance with custom timeout specified
// 2. no invocation of `req.setTimeout`
if (!args.options.agent?.options.timeout && !customRequestTimeout) return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this crashes the whole process if args.options.agent?.options is undefined with error :

TypeError: Cannot read properties of undefined (reading 'timeout')
    at HttpClientPlugin.error (/app/node_modules/dd-trace/packages/datadog-plugin-http/src/client.js:125:39)
    at /app/node_modules/dd-trace/packages/dd-trace/src/plugins/tracing.js:73:22
    at Subscription._handler (/app/node_modules/dd-trace/packages/dd-trace/src/plugins/plugin.js:14:9)
    at Channel.publish (node:diagnostics_channel:141:9)
    at req.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/http/client.js:101:30)
    at Socket.emitRequestTimeout (node:_http_client:847:9)
    at Object.onceWrapper (node:events:631:28)
    at Socket.emit (node:events:529:35)
    at Socket.emit (node:domain:552:15)
    at Socket.emit (/app/node_modules/dd-trace/packages/datadog-instrumentations/src/net.js:61:25)
    at Socket._onTimeout (node:net:598:8)
    at listOnTimeout (node:internal/timers:569:17)
    at process.processTimers (node:internal/timers:512:7)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is an open github issue as well.
#3939

We are facing issue of CrashLoopBackOff very often due to this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot 2024-02-01 at 10 39 16 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodeJS 20: dd-trace reports HTTP requests with duration above 5 seconds as errors
4 participants