Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics Rewrite] implement monitored resource mapping #252

Merged

Conversation

aabmass
Copy link
Contributor

@aabmass aabmass commented Dec 22, 2021

Implements monitored resource mapping. Added in a new file and added a bunch of tests. Integration tests will come in a separate PR.

There are a lot of breaking changes here from the previous OC mapping, I need to update the markdown still.

@aabmass aabmass requested review from a team and kjordy December 22, 2021 00:46
@aabmass aabmass marked this pull request as ready for review December 22, 2021 00:53
exporter/collector/monitoredresource.go Show resolved Hide resolved
// resource keys for a given monitored resource type. For entries with multiple OTel
// resource keys, the keys' values will be coalesced in order until there is a non-empty
// value.
monitoredResourceMappings = map[string]map[string][]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this intended to be used with Configuration from the user? Should we allow users to continue performing MR mapping via config as well, and if so is there a way for that to override this behavior?

Do we expect that to solely be via map[string]map[string][]string manipulation of defaults w/ config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect that to solely be via map[string]map[string][]string manipulation of defaults w/ config?

That would work except we need an extra field for "discriminating" the incoming resource to one of the entries in the map. For the default logic here, this is cloud.platform but we have the special cases for differentiating k8s_{container,pod,node,cluster} and for the fallbacks.

We could provide a full escape-hatch function config that lets users (probably 1st-party) override this whole thing in a custom build if needed.

But for now can we punt this until we know the use cases better?

exporter/collector/monitoredresource.go Outdated Show resolved Hide resolved
instanceID: {semconv.AttributeHostID},
},
k8sContainer: {
location: {semconv.AttributeCloudAvailabilityZone},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a comment in the spec from @dashpole about the value of this field for regional clusters. I don't quite understand how the resource detection would work for regional clusters but this may need to have both zone and then region if zone isn't populated for the regional clusters. Though I'm good with starting with only zone for now until the resource detection piece is figured out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I'm good with starting with only zone for now until the resource detection piece is figured out.

Lets do this for now, but I'll leave this comment unresolved

exporter/collector/monitoredresource_test.go Show resolved Hide resolved
taskID: {semconv.AttributeServiceInstanceID},
},
genericNode: {
location: {semconv.AttributeCloudAvailabilityZone, semconv.AttributeCloudRegion},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says that on the last fall through, the location should be populated with "global" if semantic convention labels aren't populated. This is because writing with a location set to empty string will be an error. Do you think we need to handle that case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't use global if we default to workload.googleapis.com right? DO we need to update the spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed for generic_task/generic_node. For the other resources it should really be present, so I'll leave those to empty to surface the error

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the value of the "location" label to "global" is still allowed under workload.googleapis.com.

@aabmass aabmass merged commit d61bf83 into GoogleCloudPlatform:col-exporter-rewrite Dec 22, 2021
@aabmass aabmass deleted the monitored-resource branch December 22, 2021 22:42
dashpole added a commit that referenced this pull request Feb 2, 2022
* Skip all fixture tests (#239)

* Initial structure for new pdata metrics exporter (#238)

* [Metrics Rewrite] add outline with todos for fragmenting work (#240)

* [Metrics Rewrite] attribute to label mapping (#243)

[Metrics Rewrite] attribute to label mapping

* [Metrics Rewrite] support for pdata Sum points (#242)

* [Metrics Rewrite] support for pdata Sum points

* update breaking-changes.md

* use concatentation instead of sprintf

* [Metrics Rewrite] support for pdata Gauge points (#244)

* Add logic to translate metric descriptors and initial flow (#247)

* Fixes from merge.

* Fix tests.

* Clean up test cases, re-disable integration tests.

* Add summary descriptors and label descriptors.

* Fix lint issues.

* Some fixes from review.

* Remove metric import.

* Fixes from review.
- Update default config method
- Simplify some of my lack-of-go expertise.

* Add unit test for metric domains.

* Fixes from review.

* Add breaking changes.

* Fixes from review.

* Update context to be TODO.

* Add support for exponential histograms and exemplars. (#251)

* Add support for exponential histograms and exemplars.

* Fixes from review.

* Fixes from review.

* Fixes from discussion.

* [Metrics Rewrite] implement monitored resource mapping (#252)

* [Metrics Rewrite] implement monitored resource mapping

* review fixes

* [Metrics Rewrite] update breaking-changes.md for monitored resource (#255)

* Add summary mapping to exporter. (#249)

* Add config to call `CreateServiceTimeSeries` (#259)

* Initial implementation of create service time series.

* Add a test case for create service timeseries.

* Add logic to auto-detect project id if not configured.

* Fix from code review

* Fix resource to be one that has retention policy for integration tests.

* Add support for histogram to metrics exporter. (#258)

BUG=210164184

* Re-enable ops-agent self-metric integration test. (#260)

* [Metrics Rewrite] add ExponentialHistogram fixture (#257)

* [Metrics Rewrite] add ExponentialHistogram fixture

* make tests deterministic

* few last changes

* close channel instead of sending a message

* Enable ops agent host metric integration test. (#264)

- There is a bug in upstream agent-metric-processor that sets incorrect units on usage metrics (GoogleCloudPlatform/opentelemetry-operations-collector#72)
- We update the expectations for inculsion of units in CreateTimeSeries
- We disable metric descriptors (for now).  Given the bug in agent-metric-processor, liekly ops-agent will need upstream fix for this first.

* add a feature gate, which defaults to false, for using the re-written exporter (#267)

* Enable Basic integration tests (#266)

* Enable basic counter test.

* Enable delta counter metrics.

- Note: Delta counters are NOW fake-delta (i.e. cumulatives with limited time windows)

* Enable non-monotonic-sum integration test.

* Re-enable summary integration test and fix design issues in summary translation.

- Summary exports percentiles, not quantiles
- Percentiles should include similar double precision in the string.

* Fix recordfixtures script to use featuregate (#270)

* Skip already seen attribute keys when creating LabelDescriptors (#272)

* Reenable GKE metrics agent fixtures (#271)

* Update breaking-changes.md for googlecloudmonitoring/point_count self observability (#277)

* Move logging to use zap-logger and set up self-observability to match collector expectations. (#275)

* Enable metric prefix integraiton tests. (#274)

* enable workloadapis prefix integration test.

* update unknown domain metrics expect.

* Add instrumentationLibraryToLabels method to metrics exporter. (#253)

* Add instrumentationLibraryToLabels method to metrics exporter.

BUG=https://b.corp.google.com/issues/210164355

* Remove custom_metrics_domains behaviour from metrics-exporter.

* Remove dependency on go.opentelemetry.io/collector (#279)

* remove dependency on go.opentelemetry.io/collector

* add ocgrpc metrics to exporters' self-obs metrics (#280)

* Use OC stackdriver exporter to capture self observability metrics as GCM protos (#282)

* Capture ocgrpc self observability metrics (#283)

* make integrationtest not internal (#285)

* Remove internal/ prefix for integrationtest (#288)

* Add batching support to metrics-exporter. (#286)

* Add batching support to metrics-exporter.

* Retry when we fail to write metric descriptors.

* Re-enable workload metrics integration tests (#278)

* update header year for new files (#296)

* Document new CreateMetricDescriptor behavior (#294)

* reenable disabled metrics test (#299)

Co-authored-by: Aaron Abbott <[email protected]>
Co-authored-by: Josh Suereth <[email protected]>
Co-authored-by: Thomas Barker <[email protected]>
Co-authored-by: Punya Biswal <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants