Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multitenant: expose consumption metrics #67873

Merged
merged 2 commits into from
Jul 27, 2021

Conversation

RaduBerinde
Copy link
Member

This PR is for the top two commits. The rest is #67768.

util: add AggGaugeFloat64

This commit adds a float aggregate gauge. We also add support for
incrementing a GaugeFloat64 (required for the aggregate gauge).

Release note: None

multitenant: expose consumption metrics

This commit adds per-tenant consumption metrics. Each node presents
the latest consumption value that it knows about, for each tenant. The
higher-level logic will take the Max value across all nodes.

The metrics are:

  • tenant.consumption.request_units
  • tenant.consumption.read_requests
  • tenant.consumption.read_bytes
  • tenant.consumption.write_requests
  • tenant.consumption.write_bytes
  • tenant.consumption.sql_pods_cpu_seconds

Note that these are aggregate metrics, meaning that we export a
separate value for each tenant, and also a sum across all tenants. The
sums are not meaningful in this case and should not be used.

Release note: None

@RaduBerinde RaduBerinde requested a review from a team as a code owner July 21, 2021 19:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@RaduBerinde RaduBerinde force-pushed the partial-with-metrics branch from d246e9b to 5a51d2f Compare July 22, 2021 05:03
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a "fine" approach because it's pragmatic and doesn't necessarily increase the cardinality of metrics in the system by a ton. The counter argument is that most tenants are likely to not have many ranges (at least, today they are likely to have many ranges) and that means that only a small number of nodes in the host cluster are likely to house the data for a tenant. However, ever tenant is going to round-robin over the host cluster nodes so most host cluster nodes will at some point have this usage data for each tenant. At some point this time series cardinality is going to hurt.

Another approach is to not read this directly from this service but rather to put up a separate service in the operator which serves this data by polling or creating a changefeed over the table. I think that approach would be better.

Reviewed 3 of 34 files at r1, 3 of 4 files at r3, 8 of 21 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kernfeld-cockroach and @RaduBerinde)


pkg/ccl/multitenantccl/tenantcostserver/metrics.go, line 40 at r4 (raw file):

		syncutil.Mutex
		// tenantMetrics stores the tenantMetrics for all tenants that have
		// send TokenBucketRequests to this node.

nit: s/send/sent/


pkg/ccl/multitenantccl/tenantcostserver/metrics.go, line 42 at r4 (raw file):

		// send TokenBucketRequests to this node.
		// TODO(radu): add garbage collection to remove inactive tenants.
		tenantMetrics map[roachpb.TenantID]tenantMetrics

What's your vision here? Timestamps? Have we considered a radically different approach for these metrics: what if we built a separate service that serves a prometheus endpoint and polls the table or creates a changefeed? That way we'd dramatically reduce the cardinality of the data set side from every node in the cluster * tenants to just a constant * tenants.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your high-level idea but I don't like the operational overhead of adding an extra component that is deployed separately just to do this one thing. I'd also want to avoid some external component being dependent on the particulars of the system table.

Could we achieve the same with just code in crdb? Perhaps we could use the jobs infrastructure to have one forever-running job that is executed by a single node and gets picked up by another node on failures? (similar to what we do for auto table statistics).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kernfeld-cockroach and @RaduBerinde)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could do something like that. I thought about it some more and AndyK indicates that he doesn't expect to make host clusters larger than ~50 nodes. Given that, let's just move forward with this until it's a problem. I think one new histogram in crdb will create more time series in prometheus than this.

Reviewed 1 of 34 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kernfeld-cockroach and @RaduBerinde)


pkg/ccl/multitenantccl/tenantcostserver/metrics.go, line 42 at r4 (raw file):

Previously, ajwerner wrote…

What's your vision here? Timestamps? Have we considered a radically different approach for these metrics: what if we built a separate service that serves a prometheus endpoint and polls the table or creates a changefeed? That way we'd dramatically reduce the cardinality of the data set side from every node in the cluster * tenants to just a constant * tenants.

Still curious to understand the vision here. I don't think it matters all that much though. I think just a periodic loop would be fine.

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kernfeld-cockroach and @RaduBerinde)


pkg/ccl/multitenantccl/tenantcostserver/metrics.go, line 42 at r4 (raw file):

Previously, ajwerner wrote…

Still curious to understand the vision here. I don't think it matters all that much though. I think just a periodic loop would be fine.

Yeah timestamps is what I had in mind. I'm still thinking about it. In practice, I'd think it wouldn't be the end of the world if we don't GC at all, or GC just deleted tenants.

One issue is if we want the system to always export the metrics for all tenants, even currently inactive ones. I believe that is not currently a requirement (because Prometheus will "remember" the last seen value), but it feels shoddy. A dedicated component like you suggested would fix that. But yeah I want to merge this basic thing for now so CC can do some dev on their side.

@kernfeld-cockroach
Copy link
Contributor

I believe that is not currently a requirement (because Prometheus will "remember" the last seen value), but it feels shoddy.

FWIW I can't think of a reason this would be bad on the CC side. Operationally we care about these values to limit the tenant's usage, but if the tenant isn't active then there will be no usage to limit. We'll feed the data into billing, but it seems pretty safe to assume that missing data is all 0's.

@RaduBerinde
Copy link
Member Author

FWIW I can't think of a reason this would be bad on the CC side. Operationally we care about these values to limit the tenant's usage, but if the tenant isn't active then there will be no usage to limit. We'll feed the data into billing, but it seems pretty safe to assume that missing data is all 0's.

My concern here is about a tenant that consumed some stuff at the beginning of the month and then went inactive for a week. If the metric for this tenant is not in the crdb system, will Prometheus still remember the last consumption value that it saw a week ago? Because assuming 0 would be incorrect.

@kernfeld-cockroach
Copy link
Contributor

My concern here is about a tenant that consumed some stuff at the beginning of the month and then went inactive for a week. If the metric for this tenant is not in the crdb system, will Prometheus still remember the last consumption value that it saw a week ago? Because assuming 0 would be incorrect.

  1. I believe that Prometheus will remember the consumed value
  2. If the usage controller has ever run for that tenant, then the usage controller should also remember that value
  3. We have decided that it is okay to slightly under-charge users for usage, and this seems slight enough to not worry too much about

@RaduBerinde
Copy link
Member Author

Cool, thanks. I buy 1, and to some extent 2 (there is a certain fragility if we only relied on it). Regarding 3, the tenant's usage before the inactivity period could be arbitrarily high; and they would still eventually be billed for it, just not in the right billing period.

@kernfeld-cockroach
Copy link
Contributor

Regarding 3, the tenant's usage before the inactivity period could be arbitrarily high; and they would still eventually be billed for it, just not in the right billing period.

Oof, good point. I didn't think of that.

@RaduBerinde RaduBerinde force-pushed the partial-with-metrics branch from 5a51d2f to ea52aff Compare July 26, 2021 00:06
This commit adds a float aggregate gauge. We also add support for
incrementing a GaugeFloat64 (required for the aggregate gauge).

Release note: None
This commit adds per-tenant consumption metrics. Each node presents
the latest consumption value that it knows about, for each tenant. The
higher-level logic will take the Max value across all nodes.

The metrics are:
 - tenant.consumption.request_units
 - tenant.consumption.read_requests
 - tenant.consumption.read_bytes
 - tenant.consumption.write_requests
 - tenant.consumption.write_bytes
 - tenant.consumption.sql_pods_cpu_seconds

Note that these are aggregate metrics, meaning that we export a
separate value for each tenant, and also a sum across all tenants. The
sums are not meaningful in this case and should not be used.

Release note: None
@RaduBerinde RaduBerinde force-pushed the partial-with-metrics branch from ea52aff to ac7ff37 Compare July 26, 2021 17:03
@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 27, 2021

Build failed:

@RaduBerinde
Copy link
Member Author

bors r+

@craig
Copy link
Contributor

craig bot commented Jul 27, 2021

Build succeeded:

@craig craig bot merged commit c60d970 into cockroachdb:master Jul 27, 2021
@RaduBerinde RaduBerinde deleted the partial-with-metrics branch July 28, 2021 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants