Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Express instrumentation results in routes with no stack, crashing Ghost #2271

Closed
Zirak opened this issue Jun 8, 2024 · 9 comments · Fixed by #2294
Closed

Express instrumentation results in routes with no stack, crashing Ghost #2271

Zirak opened this issue Jun 8, 2024 · 9 comments · Fixed by #2294
Labels
bug Something isn't working has:reproducer This bug/feature has a minimal reproduction provided pkg:instrumentation-express priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies

Comments

@Zirak
Copy link
Contributor

Zirak commented Jun 8, 2024

What happened?

Steps to Reproduce

Given the following index.js, which imitates Ghost's routing:

const http = require('http');
const express = require('express');

// https://github.com/TryGhost/Ghost/blob/fefb9ec395df8695d06442b6ecd3130dae374d94/ghost/core/core/frontend/web/site.js#L31
let router;
const SiteRouter = (req, res, next) => router(req, res, next);
function setupSiteApp() {
    const siteApp = express('site');

    // fascimile of ParentRouter
    router = express.Router();
    router.use('/:slug', (_1, _2, next) => {
        console.log('application code - I will never trigger');
        next()
    });
    router.param('slug', triggerBug);

    Object.setPrototypeOf(SiteRouter, router)
    siteApp.use(SiteRouter);

    return siteApp;
}

// this is the bug-inducing function, ParentRouter#_respectDominantRouter
function triggerBug(req, res, next, slug) {
    // ParentRouter#getSiteRouter
    const siteRouter = req.app._router.stack.find((router) => {
        return router.name === 'SiteRouter';
    });

    console.log('stack:', siteRouter.handle.stack);
    // this should throw
    siteRouter.handle.stack.every;
    // ...and app code will never trigger
    next();
}

// now that setup is done, try to trigger the bug
const app = setupSiteApp();
app.listen(5000, () => {
    console.log('listening');
    http.get('http://localhost:5000/foo', (res) => {
        res.resume();
    });
});

Run twice:

# without otel
node index.js
# with otel
env NODE_OPTIONS='--require @opentelemetry/auto-instrumentations-node/register' node index.js

Expected Result

Both runs print a router stack like:

❯ node index.js
listening
stack: [
  Layer {
    handle: [Function (anonymous)],
    name: '<anonymous>',
    params: { slug: 'foo' },
    path: '/foo',
    keys: [ [Object] ],
    regexp: /^\/(?:([^\/]+?))\/?(?=\/|$)/i {
      fast_star: false,
      fast_slash: false
    },
    route: undefined
  }
]

Actual Result

When running with otel, the router is undefined, causing shenanigans and a server 500:

❯ env NODE_OPTIONS='--require @opentelemetry/auto-instrumentations-node/register' OTEL_LOG_LEVEL=none node index.js
listening
stack: undefined
TypeError: Cannot read property 'every' of undefined
    at triggerBug (/home/zirak/work/ghost-every-siterouter-bug/index.js:33:29)
    at paramCallback (/home/zirak/work/ghost-every-siterouter-bug/node_modules/express/lib/router/index.js:415:7)
    at param (/home/zirak/work/ghost-every-siterouter-bug/node_modules/express/lib/router/index.js:395:5)
    at Function.process_params (/home/zirak/work/ghost-every-siterouter-bug/node_modules/express/lib/router/index.js:421:3)
    at next (/home/zirak/work/ghost-every-siterouter-bug/node_modules/express/lib/router/index.js:280:10)
    at Function.handle (/home/zirak/work/ghost-every-siterouter-bug/node_modules/express/lib/router/index.js:175:3)
    at router (/home/zirak/work/ghost-every-siterouter-bug/node_modules/express/lib/router/index.js:47:12)
    at SiteRouter (/home/zirak/work/ghost-every-siterouter-bug/index.js:6:40)
    at /home/zirak/work/ghost-every-siterouter-bug/node_modules/@opentelemetry/instrumentation-express/build/src/instrumentation.js:214:37
    at Layer.handle [as handle_request] (/home/zirak/work/ghost-every-siterouter-bug/node_modules/express/lib/router/layer.js:95:5)

Additional Details

It's also possible to replicate this with Ghost itself. Running Ghost with opentelemetry and visiting http://localhost:2368/about/ results in a 500.

I've yet to successfully debug exactly where this originates.

OpenTelemetry Setup Code

No response

package.json

No response

Relevant log output

No response

@Zirak Zirak added bug Something isn't working triage labels Jun 8, 2024
@Zirak
Copy link
Contributor Author

Zirak commented Jun 9, 2024

Debugging it further, this happens because the express instrumentation wraps the layer.handle function:

Express adds properties to the layer.handle function:

https://github.com/expressjs/express/blob/61421a8c0c2abf011868d90df93813992e3c7563/lib/router/index.js#L46-L58

By returning a new wrapped function, these properties are lost. I verified by placing a breakpoint in the otel code above around the wrap call, verifying:

original.stack // [Layer]
wrapped.stack // undefined
wrapped.__original.stack // [Layer]

Right now just to solve this internally, I've decided to hack around it by having the wrapped function inherit the original one, here's a patch-package diff:

diff --git a/node_modules/@opentelemetry/instrumentation-express/build/src/instrumentation.js b/node_modules/@opentelemetry/instrumentation-express/build/src/instrumentation.js
index 3e850bd..8b6eaf1 100644
--- a/node_modules/@opentelemetry/instrumentation-express/build/src/instrumentation.js
+++ b/node_modules/@opentelemetry/instrumentation-express/build/src/instrumentation.js
@@ -123,7 +123,7 @@ class ExpressInstrumentation extends instrumentation_1.InstrumentationBase {
         this._wrap(layer, 'handle', (original) => {
             if (original.length === 4)
                 return original;
-            return function (req, res) {
+            const wrapped = function (req, res) {
                 (0, utils_1.storeLayerPath)(req, layerPath);
                 const route = req[internal_types_1._LAYERS_STORE_PROPERTY]
                     .filter(path => path !== '/' && path !== '/*')
@@ -209,6 +209,22 @@ class ExpressInstrumentation extends instrumentation_1.InstrumentationBase {
                 }
                 return result;
             };
+
+            // express sets multiple properties on the handler *function*. By
+            // returning a new function, we eliminate possible access to the
+            // previously set properties. As a workaround, set the original as
+            // the wrapped's prototype. This allows `get`s to go through. It's
+            // not foolprood, since `set`s will still only apply to the wrapped
+            // function.
+            //
+            // For a concrete example, try accessing:
+            //
+            // original.stack
+            //   vs.
+            // wrapped.stack
+            Object.setPrototypeOf(wrapped, original);
+
+            return wrapped;
         });
     }
     _getSpanName(info, defaultName) {

Right now it's working, but it's a little iffy. I'm more than open to PR this, lmk what you think.

Thanks

@pichlermarc
Copy link
Member

Hi @Zirak thanks for the detailed report. Since this is related to the express instrumentation, I'll transfer this issue to the contrib repo.

@pichlermarc pichlermarc transferred this issue from open-telemetry/opentelemetry-js Jun 10, 2024
@pichlermarc pichlermarc added priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies pkg:instrumentation-express and removed triage labels Jun 10, 2024
@pichlermarc
Copy link
Member

@JamieDanielson would you mind taking a look at this? 🙂

@JamieDanielson
Copy link
Member

At a glance, this seems like it will be resolved with the changes in #2137 , which was created to fix #1950 . The short description includes this note: "The patched function now exposes same properties proxying the ones from the original handle function." This update will be available in the next release.

@JamieDanielson
Copy link
Member

Hmm... actually I'm testing out the changes and it doesn't seem to resolve this issue. When running this repro app in debug mode, I can see the stack on siteRouter.handle Prototype. When running in debug mode with the auto-instrumentations register command, that stack is still missing... but as mentioned above, it does exist on __original. This will need a PR, and I am also curious if this bit of detail may be necessary for the initial fix to fully work in its intended scenario as well.

@Zirak thank you again for the detailed report and reproducer. If you are open to it, a PR would be appreciated on this one!

@JamieDanielson JamieDanielson added the has:reproducer This bug/feature has a minimal reproduction provided label Jun 10, 2024
@JamieDanielson
Copy link
Member

JamieDanielson commented Jun 10, 2024

If you are open to it, a PR would be appreciated on this one!

In some local testing I was able to add Object.setPrototypeOf(patched, original); right after the Object.keys(original).forEach... and before the new return patched; and seems to resolve:

  Object.keys(original).forEach(key => {
      Object.defineProperty(patched, key, {
          get() {
              return original[key];
          },
          set(value) {
              original[key] = value;
          },
      });
  });
  Object.setPrototypeOf(patched, original);
  return patched;

@Zirak
Copy link
Contributor Author

Zirak commented Jun 12, 2024

Thanks for the follow-up, and talk about timing with the ongoing express patch!

Since it was released I able to install @opentelemetry/[email protected] and reproduce that the bug is still happening. After some debugging I think I understand why: These properties are not original's own properties, they belong to its prototype. Putting a conditional breakpoint in the newer Object.keys(original)... line (I put a conditional breakpoint on original.stack), we can see:

> original.stack
[Layer]

> Object.keys(original)
[]

> original.hasOwnProperty('stack')
false

> Object.getPrototypeOf(original).hasOwnProperty('stack')
true

> Object.keys(Object.getPrototypeOf(original))
['params', '_params', 'caseSensitive', 'mergeParams', 'strict', 'stack']

Express internals confuse me all the way to the 7th circle, so there might be flows in which these keys do exist on original and not on its prototype - I'm not 100% sure when, since I couldn't reproduce them in local testing. @david-luna when you wrote #2137 were you able to reproduce "outside of the lab"? I'm curious what flows reach this behaviour.

The reason my (naive) setPrototypeOf works is that it wholesale defers all getters to the prototype. As @david-luna correctly points out in the patch PR, this isn't ideal since it doesn't defer setters.

Seeing the above, I have three suggestions in mind:

  1. Use inheritance and leave setters to be a future problem (nothing can possibly go wrong, right?)
  2. Copy keys from the original prototype in addition to original
  3. Instead of returning a normal wrapped function, return a Proxy which defers getters/setters to original, and apply to wrapped

3 is I think most "pure" solution, in that it emulates existing behaviour, but comes with two tradeoffs:

  1. Proxies are slower than regular objects (could be better since I last benchmarked a bazillion years ago)
  2. I have no idea how this will behave in the larger SDK, like unwrapping

@david-luna
Copy link
Contributor

@Zirak @JamieDanielson

OTEL's handler patching occurs when siteApp.use(SiteRouter); is being called. So it is patching an object which has no stack property on its own but in the prototype. So in #2137 there is the wrong assumption that a handler will have its properties as own and used Object.keys to iterate over them.

So here's my 2cts on the options:

  1. with inheritance we have the properties available but we will have a similar situation as described in this comment when discussing between Object.defineProperty and copy
  2. copying properties is similar to 1 (check the linked comment). Also in this scenario all props were moved to the prototype but noting prevents another lib to do the same and move the properties even further in the prototype chain. We could walk up the chain to gather all properties on use for..in
  3. I agree is a good fit. Regarding perf the I think the use API is for adding middlewares before starting the app so if there is a perf impact that would be at startup time

I have no idea how this will behave in the larger SDK, like unwrapping

This is a good question. When unwrapping the handlers are still the patched ones. I wonder if we should keep track of them ti patch/unpatch accordingly 🤔

@david-luna
Copy link
Contributor

Note: To get a callable function with the tracing logic we need to create the proxy using the patched function as target

const patched = funtcion (req, res) { ... }; 

return new Proxy(patched, {
  get (target, prop) { return original[prop]; },
  set (target, prop, val) { original[prop] = val; },
});

otherwise the returned function would have no tracing logic at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working has:reproducer This bug/feature has a minimal reproduction provided pkg:instrumentation-express priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies
Projects
None yet
4 participants