Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ioredis clash #1327

Closed
a-reuss opened this issue Sep 12, 2024 · 16 comments
Closed

ioredis clash #1327

a-reuss opened this issue Sep 12, 2024 · 16 comments
Assignees
Labels

Comments

@a-reuss
Copy link

a-reuss commented Sep 12, 2024

Problem Description

In our application, we use ioredis as short term database. Since the update of instana to version 3.18.0 where there were changes to the interaction between instana collector and redis used as instana internal data store, our traces cannot be correctly related to traces recorded in the rest of the system.

Short, Self Contained Example

The reason is that although the necessary header ist correctly transferred through all services involved, the trace ids between our subsystem and the rest of the system do not match anymore. Is it possible that there is a misconfiguration for the ioredis connection instana uses. The most likely candidate for this is the newly introduced code block from line 40 in packages/core/src/tracing/instrumentation/database/ioredis.js

Text
clusterConnectionString = this.startupNodes.map(node => ${node.host}:${node.port}).join(',');

Additionally, we see issues with parent spans that might come from the change in line 84 in packages/core/src/tracing/instrumentation/database/ioredis.js where 'multi' commands are now treated differently to before. We see error messages of the kind
Cannot start an intermediate span (Foo.bar()) as this requires an active entry (or intermediate) span as parent. But the currently active span is an exit span:
that we did not see before.

We do not know whether the 2 issues are linked to each other at all, or whether they are independent.

Node.js Version

20.x

package.json

{
  "devDependencies": {
    "@aws-sdk/client-ssm": "3.624.0",
    "@cucumber/cucumber": "10.8.0",
    "@eslint/eslintrc": "3.1.0",
    "@eslint/js": "9.8.0",
    "@types/async-retry": "1.4.8",
    "@types/http-cache-semantics": "4.0.4",
    "@types/jest": "29.5.12",
    "@types/js-yaml": "4.0.9",
    "@types/jsonwebtoken": "9.0.6",
    "@types/koa": "2.15.0",
    "@types/koa__router": "12.0.4",
    "@types/koa-bodyparser": "4.3.12",
    "@types/koa-etag": "3.0.3",
    "@types/koa-mount": "4.0.5",
    "@types/koa-router": "7.4.8",
    "@types/koa-static": "4.0.4",
    "@types/luxon": "3.4.2",
    "@types/mustache": "4.2.5",
    "@types/node": "20.14.14",
    "@types/proxy-addr": "2.0.3",
    "@types/selenium-webdriver": "4.1.24",
    "@types/semver": "7.5.8",
    "@types/traverse": "0.6.37",
    "@types/uuid": "10.0.0",
    "@typescript-eslint/eslint-plugin": "8.0.1",
    "@typescript-eslint/parser": "8.0.1",
    "@typescript-eslint/rule-tester": "8.0.1",
    "aws-sdk-client-mock": "4.0.1",
    "cucumber-html-reporter": "6.0.0",
    "eslint": "9.8.0",
    "eslint-config-prettier": "9.1.0",
    "eslint-plugin-custom-qnoten-rules": "file:./eslint-rules",
    "eslint-plugin-prettier": "5.2.1",
    "globals": "15.9.0",
    "jest": "29.7.0",
    "jest-mock": "29.7.0",
    "mustache": "4.2.0",
    "prettier": "3.3.3",
    "rimraf": "6.0.1",
    "selenium-webdriver": "4.23.0",
    "ts-jest": "29.2.3",
    "ts-node": "10.9.2",
    "tsconfig-paths": "4.2.0",
    "typescript": "5.5.4"
  },
  "dependencies": {
    "@apidevtools/swagger-parser": "10.1.0",
    "@aws-sdk/client-kms": "3.624.0",
    "@aws-sdk/client-s3": "3.624.0",
    "@mob/cwb-datalake-event-emitter": "1.0.38",
    "@omniweb/talo-logging": "1.0.0-hash3edde8d9b3eac2ee6fc1f12015295589c01c175875d216f150851c169536dfe8",
    "ajv": "8.17.1",
    "async": "3.2.5",
    "async-retry": "1.3.3",
    "awilix": "10.0.2",
    "awilix-koa": "10.1.0",
    "axios": "1.7.4",
    "axios-cache-interceptor": "1.5.3",
    "axios-retry": "4.5.0",
    "fast-xml-parser": "4.4.1",
    "hpagent": "1.2.0",
    "http-cache-semantics": "4.1.1",
    "ioredis": "5.4.1",
    "js-yaml": "4.1.0",
    "jsonwebtoken": "9.0.2",
    "jwks-rsa": "3.1.0",
    "koa": "2.15.3",
    "koa-bodyparser": "4.4.1",
    "koa-etag": "4.0.0",
    "koa-mount": "4.0.0",
    "koa-router": "12.0.1",
    "koa-static": "5.0.0",
    "luxon": "3.5.0",
    "oas3-chow-chow": "3.0.1",
    "openapi-types": "12.1.3",
    "proxy-addr": "2.0.7",
    "rate-limiter-flexible": "5.0.3",
    "semver": "7.6.3",
    "traverse": "0.6.9",
    "uuid": "10.0.0",
    "xss": "1.0.15"
  }
}

package-lock.json

{
  "requires": true,
  "packages": {
    "": {
      "name": "qnoten",
      "version": "0.0.0",
      "license": "UNLICENSED",
      "dependencies": {
        "@apidevtools/swagger-parser": "10.1.0",
        "@aws-sdk/client-kms": "3.624.0",
        "@aws-sdk/client-s3": "3.624.0",
        "@mob/cwb-datalake-event-emitter": "1.0.38",
        "@omniweb/talo-logging": "1.0.0-hash3edde8d9b3eac2ee6fc1f12015295589c01c175875d216f150851c169536dfe8",
        "ajv": "8.17.1",
        "async": "3.2.5",
        "async-retry": "1.3.3",
        "awilix": "10.0.2",
        "awilix-koa": "10.1.0",
        "axios": "1.7.4",
        "axios-cache-interceptor": "1.5.3",
        "axios-retry": "4.5.0",
        "fast-xml-parser": "4.4.1",
        "hpagent": "1.2.0",
        "http-cache-semantics": "4.1.1",
        "ioredis": "5.4.1",
        "js-yaml": "4.1.0",
        "jsonwebtoken": "9.0.2",
        "jwks-rsa": "3.1.0",
        "koa": "2.15.3",
        "koa-bodyparser": "4.4.1",
        "koa-etag": "4.0.0",
        "koa-mount": "4.0.0",
        "koa-router": "12.0.1",
        "koa-static": "5.0.0",
        "luxon": "3.5.0",
        "oas3-chow-chow": "3.0.1",
        "openapi-types": "12.1.3",
        "proxy-addr": "2.0.7",
        "rate-limiter-flexible": "5.0.3",
        "semver": "7.6.3",
        "traverse": "0.6.9",
        "uuid": "10.0.0",
        "xss": "1.0.15"
      },
      "devDependencies": {
        "@aws-sdk/client-ssm": "3.624.0",
        "@cucumber/cucumber": "10.8.0",
        "@eslint/eslintrc": "3.1.0",
        "@eslint/js": "9.8.0",
        "@types/async-retry": "1.4.8",
        "@types/http-cache-semantics": "4.0.4",
        "@types/jest": "29.5.12",
        "@types/js-yaml": "4.0.9",
        "@types/jsonwebtoken": "9.0.6",
        "@types/koa": "2.15.0",
        "@types/koa__router": "12.0.4",
        "@types/koa-bodyparser": "4.3.12",
        "@types/koa-etag": "3.0.3",
        "@types/koa-mount": "4.0.5",
        "@types/koa-router": "7.4.8",
        "@types/koa-static": "4.0.4",
        "@types/luxon": "3.4.2",
        "@types/mustache": "4.2.5",
        "@types/node": "20.14.14",
        "@types/proxy-addr": "2.0.3",
        "@types/selenium-webdriver": "4.1.24",
        "@types/semver": "7.5.8",
        "@types/traverse": "0.6.37",
        "@types/uuid": "10.0.0",
        "@typescript-eslint/eslint-plugin": "8.0.1",
        "@typescript-eslint/parser": "8.0.1",
        "@typescript-eslint/rule-tester": "8.0.1",
        "aws-sdk-client-mock": "4.0.1",
        "cucumber-html-reporter": "6.0.0",
        "eslint": "9.8.0",
        "eslint-config-prettier": "9.1.0",
        "eslint-plugin-custom-qnoten-rules": "file:./eslint-rules",
        "eslint-plugin-prettier": "5.2.1",
        "globals": "15.9.0",
        "jest": "29.7.0",
        "jest-mock": "29.7.0",
        "mustache": "4.2.0",
        "prettier": "3.3.3",
        "rimraf": "6.0.1",
        "selenium-webdriver": "4.23.0",
        "ts-jest": "29.2.3",
        "ts-node": "10.9.2",
        "tsconfig-paths": "4.2.0",
        "typescript": "5.5.4"
      }
    },
@a-reuss a-reuss added the bug label Sep 12, 2024
@kirrg001 kirrg001 self-assigned this Sep 12, 2024
@kirrg001
Copy link
Contributor

Hey @a-reuss !

Thanks for your report!
I'd suggest to downgrade the version for now.

We will take a look now.

If possible, could you please share parts of your application log?
You can also set INSTANA_DEBUG=true.

Thanks so much!

@kirrg001
Copy link
Contributor

@a-reuss Are you using a cluster or a single connection?

@kirrg001
Copy link
Contributor

I will revert the multi/pipeline change. We thought that this is a bug.

@a-reuss
Copy link
Author

a-reuss commented Sep 12, 2024

@kirrg001 do you have an idea, how long it will take for the updated version to be available?

@kirrg001
Copy link
Contributor

I am already working on the revert. We can release it tomorrow.

@Xeroxxx
Copy link

Xeroxxx commented Sep 12, 2024

Hello,
thank you @a-reuss we've the same issue and thought its on our side.

@kirrg001 thanks for looking into this.

kirrg001 added a commit that referenced this issue Sep 12, 2024
refs #1327
refs #1292

We thought that this is a bug. The tests were not really clear.
We revert the multi/pipeline handling for now.

Raised https://jsw.ibm.com/browse/INSTA-14540 for further investigation.
@kirrg001
Copy link
Contributor

Raised PR #1328

Hopefully it resolves the problem 💁‍♀️

@a-reuss
Copy link
Author

a-reuss commented Sep 12, 2024

Thanks, we will check and give feedback

kirrg001 added a commit that referenced this issue Sep 12, 2024
refs #1327
refs #1292

We thought that this is a bug. The tests were not really clear.
We revert the multi/pipeline handling for now.

Raised https://jsw.ibm.com/browse/INSTA-14540 for further investigation.
@kirrg001
Copy link
Contributor

We have successfully shipped 3.18.1 (except for the AWS Lambda Layer China regions cn-north-1 and cn-northwest-1).
Hopefully it resolves your issues 🙏


If the hot fix does not resolve the problem, please downgrade to v3.17.1 and we will investigate with more time next week.


Furthermore: Could you please share some more information about the use case / example code (if you can)?
That would be helpful.

our traces cannot be correctly related to traces recorded in the rest of the system
the trace ids between our subsystem and the rest of the system do not match anymore

Thank you so much!

@a-reuss
Copy link
Author

a-reuss commented Sep 13, 2024

@kirrg001, I will send details to your email address from my corporate account (not the account I use here) for reasons of compliance.
We are still testing the 3.18.1 image.

@a-reuss
Copy link
Author

a-reuss commented Sep 17, 2024

Hi, @kirrg001. As written in the mail, we see that the problem persists with 3.18.1 . Do you have a porposal how to proceed. AS recommented, we pinned the version 3.17.1, but that is only viable until a potential bug or CVE will force us to uprade. Thanks so far!

@kirrg001
Copy link
Contributor

Thanks for sharing! I am working on another release.
We assume its either connected to the different connection string (it was an IP address before and we changed it to the cluster string) or caused by the new multi span for clusters.

@kirrg001
Copy link
Contributor

We have released 3.18.2.
Could you please test and report back? Thank you so much

@a-reuss
Copy link
Author

a-reuss commented Sep 17, 2024

Branch is created, waiting for pipeline

@a-reuss
Copy link
Author

a-reuss commented Sep 17, 2024

Es lebt! ;)

@a-reuss
Copy link
Author

a-reuss commented Sep 17, 2024

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants