server: require admin role to access node status #67067

knz · 2021-06-30T13:20:09Z

Release note (security update): The node status retrieval endpoints
over HTTP (/_status/nodes, /_status/nodes/<N> and the web UI
/#/reports/nodes) have been updated to require the admin role from
the requesting user. This ensures that operational details such as
network addresses and command-line flags do not leak to unprivileged
users.

cockroach-teamcity · 2021-06-30T13:20:16Z

This change is

nihalpednekar

LGTM from Bulk

bdarnell · 2021-07-06T17:53:18Z

pkg/server/status.go

+}
+
+// ListNodesInternal is a helper function for the benefit of SQL exclusively.
+// It skips the privilege check, assuming that SQL is doing privilege checking already.


Why can't SQL call the internal method with a context that has the appropriate user set? This seems to just invite the two methods to diverge in their access controls, or for one of them to be called inappropriately.

What is "the appropriate user"? All the access by SQL needs to retrieve the full node descriptors, so "the appropriate user" would be admin.

I mean, if any SQL user would inject the admin user into that context, it may as well not inject anything and not get a privilege check on the other side.

Also I was hoping to have this PR not become a discussion about the proper API design for what's exposed to SQL. That's a problem we know we have already (separate issue - #60584) and I wish this PR to remain backportable.

What is the privilege checking that we assume the SQL layer is doing? The one spot I checked was just doing RequireAdminUser, so it could use the public method.

I mean, if any SQL user would inject the admin user into that context, it may as well not inject anything and not get a privilege check on the other side.

I disagree - I think there's a difference between calling an "internal" no-access-control method and synthesizing a node context in order to call one that does do access control. In the latter case it's clear at the call site that you're crossing security boundaries and need to restrict what gets returned to your caller. In the former case it's less clear and it would be easier to simply return the sensitive data verbatim.

Look at the situation from my perspective:

the current code before this PR already does not do any authorization.

This PR is making the current situation better by forcing authorization for external accesses via HTTP.

You're now advocated for a richer internal API beyond what this PR is doing, in a way that is adding complexity which we aim to resolve when we fix sql: improper dependency on the RPC servers from SQL #60584.

Is this extension strictly needed to solve the missing http authorization?

My thinking is that you've added new surface area (the new ListNodesInternal method) that is always risky to use; every call site of this method must be audited to make sure it handles the results appropriately. The smallest change to fix the missing http authorization check is to simply add the missing check. This may break some things that were relying on the lack of auth, so we fix those (and only those) by making them explicitly elevate privileges.

Maybe it's more expedient to keep the status quo of unauthenticated access for non-HTTP entry points, but I'm worried that it sets us up for only partially solving the problem and we'll have to follow up by patching another one of these later. I'll OK this patch as long as you've manually verified each of the call sites of ListNodesInternal.

Let's enumerate the cases of how the Nodes() function could be called previously:

as a RPC call via gRPC. For this, all the RPC methods are already authenticated; also they already require a node principal. This is still true today.

as a HTTP API call. For this, all the HTTP endpoints are authenticated but this one specifically accepted any SQL user, not just those with admin role. This is the bug being fixed.

as direct Go calls from other packages inside the process. These were not RPCs, nor meant to be subject to any authentication/authorization (and they are performed with process-level privilege, which is the highest they can be).

This PR proposes to fix the problem at point 2, without introducing additional complexity at point 3.

In fact, coming back to your sentence:

you've added new surface area (the new ListNodesInternal method) that is always risky to use; every call site of this method must be audited to make sure it handles the results appropriately

That is incorrect. I merely renamed the existing surface area "Nodes() as an internal Go method" to ListNodesInternal(). It's the same that was there before.

I'd like to point out that:

the PR does not add a new RPC endpoint for the new function ListNodesInternal. It is not listed in a .proto file and is only available as Go function to call from other packages.

we do not know of any privilege issues stemming today from direct Go calls of RPC handler methods from other packages without going through gRPC and its authentication system. Are there any? Is that a problem we need to fix?

as direct Go calls from other packages inside the process. These were not RPCs, nor meant to be subject to any authentication/authorization (and they are performed with process-level privilege, which is the highest they can be).

I don't agree that they were not meant to be subject to any authentication/authorization. Historically we have not done a good job of this, but we would ideally have privilege checks as close to the code that accesses sensitive data as possible.

we do not know of any privilege issues stemming today from direct Go calls of RPC handler methods from other packages without going through gRPC and its authentication system. Are there any? Is that a problem we need to fix?

We don't know of any other issues with ListNodesInternal today. But now that we've identified that this data is sensitive and we've missed auth checks on other paths, it's worth looking at the other ones to make sure they're covered.

I have filed a follow-up issue to do this investigation: #67938

do you think this investigation should block this PR?
As of today, I have checked that only the nodes endpoint is used by SQL (and requires node privilege). I have not checked whether the KV endpoints or other endpoint handlers are used anywhere.

(My opinion remains that this investigation is not on the critical path to fixing the current vulnerability.)

If you want something specific to happen, please spell out the specific actions you'd like me to take.

Release note (security update): The node status retrieval endpoints over HTTP (`/_status/nodes`, `/_status/nodes/<N>` and the web UI `/#/reports/nodes`) have been updated to require the `admin` role from the requesting user. This ensures that operational details such as network addresses and command-line flags do not leak to unprivileged users.

bdarnell

LGTM

knz · 2021-07-22T17:35:28Z

TFYR

bors r=bdarnell

craig · 2021-07-22T19:52:28Z

Build failed (retrying...):

GitHub CI (Cockroach)

craig · 2021-07-22T22:10:55Z

Build succeeded:

GitHub CI (Cockroach)

knz requested review from bdarnell, itsbilal and aaron-crl June 30, 2021 13:20

This was referenced Jun 30, 2021

release-21.1: server: require admin role to access node status #67068

Merged

release-20.2: server: require admin role to access node status #67069

Merged

release-20.1: server: require admin role to access node status #67070

Merged

knz force-pushed the 20210630-status branch from 7c25e0b to fc7441b Compare July 1, 2021 15:30

knz requested review from a team, nihalpednekar and jordanlewis and removed request for a team and nihalpednekar July 1, 2021 15:30

knz force-pushed the 20210630-status branch 2 times, most recently from 2422218 to 9f4950b Compare July 5, 2021 14:09

knz mentioned this pull request Jul 5, 2021

unique bazel CI failure with a test timeout in colexec #67220

Closed

nihalpednekar approved these changes Jul 6, 2021

View reviewed changes

bdarnell reviewed Jul 6, 2021

View reviewed changes

knz mentioned this pull request Jul 22, 2021

server: audit the direct uses of RPC handler functions from other packages to check for missing authz #67938

Open

knz force-pushed the 20210630-status branch from 9f4950b to 3618a80 Compare July 22, 2021 16:30

bdarnell approved these changes Jul 22, 2021

View reviewed changes

craig bot merged commit 0119b2b into cockroachdb:master Jul 22, 2021

knz deleted the 20210630-status branch July 23, 2021 10:06

rafiss mentioned this pull request Mar 17, 2022

Allow visibility of hostnames/IP addresses of nodes for non-admin users #77665

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: require admin role to access node status #67067

server: require admin role to access node status #67067

knz commented Jun 30, 2021

cockroach-teamcity commented Jun 30, 2021

nihalpednekar left a comment

bdarnell Jul 6, 2021

knz Jul 6, 2021

knz Jul 6, 2021

knz Jul 6, 2021

bdarnell Jul 6, 2021

knz Jul 6, 2021

bdarnell Jul 6, 2021

knz Jul 9, 2021 •

edited

Loading

bdarnell Jul 12, 2021

knz Jul 22, 2021

bdarnell left a comment

knz commented Jul 22, 2021

craig bot commented Jul 22, 2021

craig bot commented Jul 22, 2021

server: require admin role to access node status #67067

server: require admin role to access node status #67067

Conversation

knz commented Jun 30, 2021

cockroach-teamcity commented Jun 30, 2021

nihalpednekar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knz Jul 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdarnell left a comment

Choose a reason for hiding this comment

knz commented Jul 22, 2021

craig bot commented Jul 22, 2021

craig bot commented Jul 22, 2021

knz Jul 9, 2021 •

edited

Loading