Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCS error handling, example, and documentation #397

Merged
merged 6 commits into from
Dec 8, 2020

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Dec 7, 2020

What this PR does:
This PR addresses the silent errors encountered here: #379 The root cause was the consumption of the iterator in getting tenant and block list for GCS backend. An error state with permissions could cause the loop to run infinitely and polling the blocklist would never complete. This behavior is cautioned about here: https://github.com/googleapis/google-cloud-go/blob/master/storage/bucket.go#L1183

This PR includes many updates, which will be explained here:

  • Added an example compose to exercise the GCS backend using https://github.com/fsouza/fake-gcs-server
    • This requires adding a new gcs backend option endpoint:string to redirect api traffic.
    • This requires adding a new gcs backend option insecure:bool to disable authentication for the api using option.WithoutAuthentication(), which allows the storage client to run without error.
    • This requires an update to the GCS storage api client due to a header parsing bug here: googleapis/google-cloud-go@b580ccd#diff-e553a431afedc9d7ba6cb6997862fcf5f951aa3c28b552ad58537ebe797825dc It seems not to be a problem for real GCS, but is required to use gcs-fake-server.
  • gcs-fake-server can run in http mode and the storage client can theoretically do that as well, however in practice it does not work (details below). Therefore the gcs-fake-server is running in https mode.
    • This requires adding a new gcs backend option to disable TLS certificate checks due to self-signed cert.
    • The storage client checks for the presence of STORAGE_EMULATOR_HOST environment variable and when present configures the api to speak to the host using http. However the storage client then loses the api prefix configured in the endpoint ("/storage/v1/") upon the first file write, breaking subsequent api calls. See code here: https://github.com/googleapis/google-cloud-go/blob/master/storage/writer.go#L129
  • GCS backend was updated to check for bucket existence on startup. This helps to fail fast if there is a problem with permissions or infrastructure. Previously it wouldn't detect the error until the first block was written, or block list was polled.
  • Added new GCS configuration documentation page, including the new options.

Other implementation notes:

  • gcs/instrumentation.go which provides gcs request metrics, which simplified to only wrap a given RoundTripper, and the creation of the transport was moved gcs.go
  • GCS backend creation: updated for changes needed to disable tls cert checks, override endpoint, check bucket existence.
  • Disabling tls checks requires making a clone of http.DefaultTransport which is then edited
  • Some other error handling was tightened up:
    • Check error on writer.Close
    • errors.Wrap other errors for better troubleshooting

Which issue(s) this PR fixes:
Fixes #379

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

… gcs backend. Add gcs example using fake-gcs-server and add necessary settings. Add documentation page for GCS configuration. Update gcs client library reference for header parsing bug fix
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you have to run make vendor-check and then I had a few questions. Otherwise looks good.

Copy link
Contributor

@achatterjee-grafana achatterjee-grafana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few copy-edit suggestions.

Comment on lines 10 to 20
trace:
backend: gcs # store traces in gcs
s3:
bucket_name: tempo # store traces in this bucket
chunk_buffer_size: 10485760 # optional. buffer size for reads. default = 10MiB
endpoint: https://storage.googleapis.com/storage/v1/ # optional. api endpoint override
insecure: false # optional. Set to true to disable authentication
# and certificate checks.
project_id: project # optional if bucket already exists. Project ID
# to use when creating bucket on first run.
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trace:
backend: gcs # store traces in gcs
s3:
bucket_name: tempo # store traces in this bucket
chunk_buffer_size: 10485760 # optional. buffer size for reads. default = 10MiB
endpoint: https://storage.googleapis.com/storage/v1/ # optional. api endpoint override
insecure: false # optional. Set to true to disable authentication
# and certificate checks.
project_id: project # optional if bucket already exists. Project ID
# to use when creating bucket on first run.
```
trace:
backend: gcs # Store traces in gcs.
s3:
bucket_name: tempo # Store traces in this bucket.
chunk_buffer_size: 10485760 # Optional, buffer size for reads. Default = 10MiB
endpoint: https://storage.googleapis.com/storage/v1/ # Optional, API endpoint override.
insecure: false # Optional, set to true to disable authentication
# and certificate checks.
project_id: project # Optional, if bucket already exists. Project ID
# to use when creating a bucket on the first run.

http_listen_port: 3100

distributor:
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
receivers: # this configuration will listen on all ports and protocols that tempo is capable of.
receivers: # This configuration will listen on all ports and protocols that tempo is capable of listening to.

Comment on lines +8 to +12
jaeger: # the receives all come from the OpenTelemetry collector. more configuration information can
protocols: # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver
thrift_http: #
grpc: # for a production deployment you should only enable the receivers you need!
thrift_binary:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
jaeger: # the receives all come from the OpenTelemetry collector. more configuration information can
protocols: # be found there: https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver
thrift_http: #
grpc: # for a production deployment you should only enable the receivers you need!
thrift_binary:
jaeger: # Receives all traces? coming from the OpenTelemetry collector. More configuration information can
protocols: # be found here: https://github.com/open-telemetry/opentelemetry-collector/tree/master/receiver
thrift_http: #
grpc: # For a production deployment you should only enable the receivers you need!
thrift_binary:


storage:
trace:
backend: gcs # backend configuration to use
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
backend: gcs # backend configuration to use
backend: gcs # The backend configuration to use.

trace:
backend: gcs # backend configuration to use
wal:
path: /tmp/tempo/wal # where to store the the wal locally
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
path: /tmp/tempo/wal # where to store the the wal locally
path: /tmp/tempo/wal # Path to store the wal locally.

wal:
path: /tmp/tempo/wal # where to store the the wal locally
bloom_filter_false_positive: .05 # bloom filter false positive rate. lower values create larger filters but fewer false positives
index_downsample: 10 # number of traces per index record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
index_downsample: 10 # number of traces per index record
index_downsample: 10 # Number of traces per index record.

insecure: true
project_id: test
pool:
max_workers: 100 # the worker pool mainly drives querying, but is also used for polling the blocklist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_workers: 100 # the worker pool mainly drives querying, but is also used for polling the blocklist
max_workers: 100 # The worker pool mainly drives querying but it is also used for polling the blocklist.

Comment on lines +39 to +40
bloom_filter_false_positive: .05 # bloom filter false positive rate. lower values create larger filters but fewer false positives
index_downsample: 10 # number of traces per index record
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bloom_filter_false_positive: .05 # bloom filter false positive rate. lower values create larger filters but fewer false positives
index_downsample: 10 # number of traces per index record
bloom_filter_false_positive: .05 # The bloom filter false positive rate. Lower values create larger filters but fewer false positives.
index_downsample: 10 # Number of traces per index record.

@joe-elliott
Copy link
Member

For the yaml comments I personally prefer lower case and brevity to grammatically correct sentences. my .02

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

compactor without GCS permissions fail silently
3 participants