Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS SDK instrumentation logs entire DynamoDB statement, including potentially sensitive values #1053

Closed
monken opened this issue Jun 9, 2022 · 10 comments
Labels
bug Something isn't working stale

Comments

@monken
Copy link

monken commented Jun 9, 2022

What version of OpenTelemetry are you using?

OpenTelemetry Lambda extension, version: v0.1.0

What version of Node are you using?

NodeJS 16

What did you do?

Use the Lambda Layer for instrumentation

What did you expect to see?

I expected not to see the whole DynamoDB statement to be part of the trace. This should not be the default. The statement will most likely include very sensitive information that you don't want to leak to your observability toolchain.

Example:

{
  "metadata": {
      "default": {
          "aws.service.api": "DynamoDB",
          "aws.signature.version": "v4",
          "aws.service.name": "DynamoDB",
          "db.statement": "{\"TableName\":\"table-name\",\"FilterExpression\":\"begins_with(GSI1PK,:a)\",\"ExpressionAttributeValues\":{\":a\":\"U:abcabcabc:\"}}",
          "rpc.service": "DynamoDB",
          "aws.service.identifier": "dynamodb",
          "db.system": "dynamodb",
          "aws.request.id": "B115QKDKVVJF66Q9ASUAAJG",
          "db.operation": "Scan",
          "rpc.system": "aws-api",
          "db.name": "table-name"
      }
  }
}

What did you see instead?

I saw the entire DynamoDB statement, unscrubbed.

Additional context

Culprit is https://github.com/open-telemetry/opentelemetry-js-contrib/blob/main/plugins/node/opentelemetry-instrumentation-aws-sdk/src/services/dynamodb.ts#L39 where the entire DynamoDB statement is serialized and added to the trace.

@monken monken added the bug Something isn't working label Jun 9, 2022
@monken monken changed the title AWS SDK instrumentation logs entire DynamoDB request, including potentially sensitive values AWS SDK instrumentation logs entire DynamoDB statement, including potentially sensitive values Jun 9, 2022
@blumamir
Copy link
Member

We should probably add a dynamoDbStatementSerializerand supply a safe default that records only non-sensitive values, similar to #1052 .

Would you be willing to contribute a PR to do that?

@monken
Copy link
Author

monken commented Jun 10, 2022

@blumamir I can take a crack at it. I thought this was an officially supported library by AWS (they definitely promote it) but it seems like this is mostly supported by Dynatrace and other companies.

@blumamir
Copy link
Member

@NathanielRN @willarmiros trivikr are from AWS and were involved in the past with this instrumentation

@NathanielRN
Copy link
Contributor

I've seen our X-Ray AWS SDK sanitize trace at times like in this PR aws/aws-xray-sdk-go#228 where we sanitized the HTTP URL query and we seem to have done efforts to sanitize the query in the X-Ray Go SDK in the past?

Other than that, we don't have precedence for it, so we wouldn't have too much bias when proposed ideas come around 🙂

@monken
Copy link
Author

monken commented Jun 11, 2022

Trying to come up with a good sanitization strategy per API call. I suggest an allow-list approach where we maintain a list of attributes that are safe to capture.

Only capture the command input for the following actions:

  • ExecuteStatement
  • BatchExecuteStatement
  • PutItem
  • BatchWriteItem
  • GetItem
  • BatchGetItem
  • Query
  • Scan
  • UpdateItem
  • DeleteItem
  • BatchWriteItem
  • ExecuteTransaction
  • TransactWriteItems
  • TransactGetItems

Always include the following attributes if present:

  • ConsistentRead
  • ExpressionAttributeNames
  • FilterExpression
  • IndexName
  • KeyConditionExpression
  • Limit
  • ProjectionExpression
  • ReturnConsumedCapacity
  • ReturnItemCollectionMetrics
  • ReturnValues
  • ReturnValuesOnConditionCheckFailure
  • ScanIndexForward
  • Segment
  • Select
  • TableName
  • TotalSegments
  • UpdateExpression

This means the following attributes will be excluded (and future attributes):

  • AttributesToGet (legacy)
  • AttributeUpdates (legacy)
  • ConditionalOperator (legacy)
  • ExclusiveStartKey
  • Expected (legacy)
  • ExpressionAttributeValues
  • Item
  • Key
  • KeyConditions (legacy)
  • NextToken
  • QueryFilter (legacy)
  • ScanFilter (legacy)
  • Statement

@blumamir
Copy link
Member

@monken Thank you for taking the time to construct this list and for bringing up this issue.
I don't have a lot of experience with DyanmoDB, but it sounds good to me. Maybe the AWS folks can supply additional input, but anyhow any improvement is welcomed and I'll be happy to see and review a PR that implements it.

My suggestion is to implement this behavior as default and to allow easy hooking via instrumentation configuration so users can customize which properties they want to capture per service / command.

If you have time, it could also be interesting to take a look at how aws-sdk instrumentations in other languages capture the db.statement for dynamo.

@monken
Copy link
Author

monken commented Jun 15, 2022

The aws-xray-sdk doesn't capture anything except the TableName from the request.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Aug 15, 2022
@github-actions
Copy link
Contributor

This issue was closed because it has been stale for 14 days with no activity.

@ithompson-gp
Copy link

Has there been any update on this with regard the AWS instrumentation exposing DynamoDB queries as an attribute?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

4 participants