authors | state |
---|---|
Gavin Frazar ([email protected]) |
implemented |
- Engineering:
@r0mant && @smallinsky && @tigrato
- Product:
@klizhentas || @xinding33
- Security:
@reedloden || @jentfoo
Auto-Discovery shall name discovered resources such that other resources of the same kind are unlikely to have the same name.
In particular, discovered cloud resource names shall include uniquely identifying metadata in the name such as region, account ID, or sub-type name.
tsh
sub-commands shall allow users to use a prefix of the resource name when
the prefix unambiguously identifies a resource.
Additionally, tsh
sub-commands shall support using label selectors to
unambiguously select a single resource.
This RFD does not apply to ssh server instance discovery, since servers are already identified within the Teleport cluster by a UUID.
Multiple discovery agents can discover resources with identical names. For example, this happened when customers had databases in different AWS regions or accounts with the same name. When a name collision occurs, only one of the databases can be accessed by users.
Name collisions can be avoided with the addition of other resource metadata in the resource name.
Since discovered resource names will be longer and more tedious to use, we
should support resource name prefixes and label matching in tsh
, Teleport
Connect, and the web UI for better UX.
Relevant issue:
Discovered database and kube cluster names shall have a lowercase suffix appended to it that includes:
- Name of the AWS matcher type
eks
,rds
,rdsproxy
,redshift
,redshift-serverless
,elasticache
,memorydb
(as of writing this RFD)- additionally, the RDS subtype
rds-aurora
is used to distinguish RDS instances vs RDS Aurora clusters.
- AWS region
- AWS account ID
All of these AWS resource types require a unique name within an AWS account and region.
By including the region and account ID, resources of the same kind in different AWS accounts or regions will avoid name collision with each-other.
By including the Teleport matcher type in the name, resources of different sub-kinds will also avoid name collision.
By combining these properties, resource names will not collide.
The reason for including eks
in kube cluster names, even though this is the
only "kind" of kube cluster we discover in AWS, is to clearly distinguish the
cluster further from clusters in other clouds, although this isn't strictly
necessary.
Example:
discovery_service:
enabled: true
aws:
- types: ["eks", "rds", "redshift"]
regions: ["us-west-1", "us-west-2"]
assume_role_arn: "arn:aws:iam::111111111111:role/DiscoveryRole"
external_id: "123abc"
tags:
"*": "*"
- types: ["eks", "rds", "redshift"]
regions: ["us-west-1", "us-west-2"]
assume_role_arn: "arn:aws:iam::222222222222:role/DiscoveryRole"
external_id: "456def"
tags:
"*": "*"
If the discovery service is configured like the above, the discovery agent will
discover AWS EKS clusters and AWS RDS and Redshift databases in the us-west-1
and us-west-2
AWS regions, in AWS accounts 111111111111
and 222222222222
.
Now suppose that an EKS cluster, RDS database, and Redshift database all named
foo
exist in both regions in both AWS accounts.
If the discovery service applies the new naming convention, the discovered
resources should be named:
foo-eks-us-west-1-111111111111
foo-eks-us-west-2-111111111111
foo-eks-us-west-1-222222222222
foo-eks-us-west-2-222222222222
foo-rds-us-west-1-111111111111
foo-rds-us-west-2-111111111111
foo-rds-us-west-1-222222222222
foo-rds-us-west-2-222222222222
foo-redshift-us-west-1-111111111111
foo-redshift-us-west-2-111111111111
foo-redshift-us-west-1-222222222222
foo-redshift-us-west-2-222222222222
This naming convention does not violate our database name validation regex,
^[a-z]([-a-z0-9]*[a-z0-9])?$
,
and does not violate our kube cluster name validation regex ^[a-zA-Z0-9._-]+$
.
Azure resources have a resource ID that uniquely identifies the resource, e.g.:
/subscriptions/00000000-1111-2222-3333-444444444444/resourceGroups/<group name>/providers/<provider name>/<name>
We could use this ID as the database name, but it is unnecessarily verbose.
It will also fail to match our database name validation regex:
[a-z]([-a-z0-9]*[a-z0-9])?
.
Additionally, all of the Azure databases, that Teleport currently supports except Redis require globally unique names (within the same type of database), because Azure assigns a DNS name:
- Redis:
<name>.redis.cache.windows.net
. - Redis Enterprise:
<name>.redisenterprise.cache.azure.net
. - SQL Server:
<name>.database.windows.net
. - Postgres:
<name.postgres.database.azure.com
. - MySQL:
<name>.mysql.database.azure.com
.
MySQL/Postgres server names must be unique among both single-server and flexible-server instances.
Therefore, we can form a uniquely identifying name among Azure resources just by adding the kind of matcher to the resource name. However, AKS kube clusters do not require globally unique names - they only need to be unique within the same resource group in the same subscription.
Additionally, resource group names may contain characters that are not valid in Teleport database/kube names, so we must either omit the resource group name in those cases or perform some kind of string transform. If we include the resource region, it will serve as a heuristic to avoid name collision when resource group names contain invalid characters. Including resource region will also be consistent with the other cloud naming schemes.
To make the naming convention consistent, and to "future-proof" it, the naming convention will be to append a suffix that includes:
- Name of the Azure matcher type
aks
,mysql
,postgres
,redis
,sqlserver
(as of writing this RFD)- additionally,
redis-enterprise
will be used to subtype Redis enterprise databases to distinguish them from non-enterprise Redis databases.
- Azure region
- Azure resource group name
- resource group names may contain characters that we do not allow in database or kube cluster names. The resource group name should be checked for invalid characters and dropped from the name suffix if it is invalid. This is only a heuristic, but any approach here will be a heuristic, and this is the simplest string transform we can do, which avoids confusing users with strange resource group names they don't recognize.
- Azure subscription ID
- subscription IDs only contains letters, digits, and hyphens.
Example:
discovery_service:
enabled: true
aws:
- types: ["aks", "mysql", "postgres"]
regions: ["eastus"]
subscriptions:
- "11111111-1111-1111-1111-111111111111"
- "22222222-2222-2222-2222-222222222222"
resource_groups: ["group1", "group2", "weird-)(-group-name"]
tags:
"*": "*"
If the discovery service is configured like the above, the discovery agent will discover Azure AKS kube clusters, Azure MySQL, and Azure PostgreSQL databases.
Now suppose that four AKS kube clusters named foo
exist in each combination of
resource group and subscription ID, and a MySQL database and Postgres database
both named foo
exist in the 1111..
subscription and group1
.
If the discovery service applies the new naming convention, the discovered
resources should be named:
foo-eastus-aks-group1-11111111-1111-1111-1111-111111111111
foo-eastus-aks-group2-11111111-1111-1111-1111-111111111111
foo-eastus-aks-group1-22222222-2222-2222-2222-222222222222
foo-eastus-aks-group2-22222222-2222-2222-2222-222222222222
foo-eastus-mysql-group1-11111111-1111-1111-1111-111111111111
foo-eastus-postgres-group1-11111111-1111-1111-1111-111111111111
If resources exist within the Azure resource group weird-)(-group-name
,
then we simply drop the resource group name from the resource name:
foo-eastus-aks-11111111-1111-1111-1111-111111111111
foo-eastus-aks-22222222-2222-2222-2222-222222222222
foo-eastus-mysql-11111111-1111-1111-1111-111111111111
- ...
Unfortunately, this would allow name collisions across resource groups.
Alternatively, we could apply a transformation to the resource group name to
make it valid.
For example, base64 encode it, make the string lowercase, and replace the
[+/=]
characters with valid characters, maybe even truncating the result:
(another heuristic, although less likely to collide names):
$ echo "weird-)(-group-name" | base64 | sed 's#[+/=]#x#g' | tr '[:upper:]' '[:lower:]' | cut -c1-8
d2vpcmqt
$ echo "other-weird-)(-group-name" | base64 | sed 's#[+/=]#x#g' | tr '[:upper:]' '[:lower:]' | cut -c1-8
b3rozxit
foo-eastus-aks-d2vpcmqt-11111111-1111-1111-1111-111111111111
foo-eastus-aks-b3rozxit-11111111-1111-1111-1111-111111111111
- ...
Each database name will be unique, since foo
must be globally unique among
all Azure MySQL databases and globally unique among all Azure Postgres databases.
Even if a new database type is added that doesn't have this globally unique name property, the resource group name and subscription ID will avoid name collisions, and the databases will be distinguished from databases in other clouds. If resource group name has invalid characters, the Azure region will make name collisions even more unlikely.
Likewise, the discovered AKS clusters will avoid colliding with other kube clusters in Azure or other clouds.
This naming convention does not violate our database name validation regex,
^[a-z]([-a-z0-9]*[a-z0-9])?$
,
and does not violate our kube cluster name validation regex ^[a-zA-Z0-9._-]+$
.
GCP discovery currently supports discovering only GKE kube clusters.
GKE cluster names are unique within the same GCP project ID and location/zone.
The discovery naming convention for GKE clusters shall be to append a suffix to the cluster name that includes:
- Name of the Teleport GCP matcher type
gke
- GCP project ID
- These can be custom, but will only consist of characters, digits, hyphens.
- GCP location
gcp:
- types: ["gke"]
locations: ["us-west1", "us-west2"]
tags:
"*": "*"
project_ids: ["my-project"]
If the discovery service is configured like the above, the discovery agent will
discover GCP GKE kube clusters in "my-project" in the us-west1
and us-west2
locations.
Now suppose GKE clusters named foo
exist in each region.
If the discovery service applies the new naming convention, the discovered
resources should be named:
foo-gke-us-west1-my-project
foo-gke-us-west2-my-project
This naming convention avoids name collisions between GKE clusters and does not collide with discovered AWS/Azure clusters.
This naming convention does not violate our kube cluster name validation regex:
^[a-zA-Z0-9._-]+$
Users will be frustrated if they are forced to type out verbose resource names
when using tsh
.
To avoid this poor UX, sub-commands should support prefix resource names, label
matching, or using a predicate expression to select a resource.
The same UX should apply to all tsh
sub-commands that take a resource name
argument. These commands shall support
tsh <sub-command> [--labels keys=val1,key2=val2,...] [--query <predicate>] [name | prefix]
syntax:
tsh db login
tsh db logout
tsh db connect
tsh db env
tsh db config
tsh kube login
tsh app login
tsh app logout
tsh app config
tsh proxy db
tsh proxy kube
tsh proxy app
To support prefix names, we add a new predicate expression function
hasPrefix
, and change the tsh
API calls to use hasPrefix(name, "<prefix>")
rather than the current predicate expression name == "<name>"
.
The --query
flag provides the full power of the predicate language, which
includes label matching.
The --labels
flag provides a less powerful, but more convenient notation for
selecting a resource by matching labels.
We already support both of these cli features as either a flag or positional arg
in other tsh
commands, e.g. tsh db ls --query="..." key1=val1,key2=val2,...
When --query
is used along with a positional arg for the resource name or
prefix, we will need to combine the two as a single predicate expression, e.g.
tsh db connect --query='labels.env == "prod"' foo-db
will be combined into the predicate expression hasPrefix(name, "foo-db") && (labels.env == "prod")
To illustrate the new UX for tsh
sub-commands, here is an example using
tsh db connect
to select a database (the same applies for other commands):
$ tsh db ls
Name Description Allowed Users Labels Connect
------ ------------------- ------------------- --------------------------- -------
bar RDS instance in ... [*] account-id=123456789012,region=us-west-1,env=dev,...
bar RDS instance in ... [*] account-id=123456789012,region=us-west-2,env=dev,...
foo RDS instance in ... [*] account-id=123456789012,region=us-west-1,env=prod,...
# connect by prefix name
$ tsh db connect --db-user=alice --db-name-postgres foo
#...connects to "foo-rds-us-west-1-123456789012" by prefix...
# ambiguous prefix name is an error
$ tsh db connect --db-user=alice --db-name-postgres bar
error: ambiguous database name could match multiple databases:
Name Description Protocol Type URI Allowed Users Labels Connect
------------------------------ ------------------------- -------- ---- ----------------------------------------------------- ------------- ----------------------------------------------------------------------------------------------------------------------------------------- -------
bar-rds-us-west-1-123456789012 RDS instance in us-west-1 postgres rds bar.abcdefghijklmnop.us-west-1.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=dev,region=us-west-1,teleport.dev/origin=dynamic
bar-rds-us-west-2-123456789012 RDS instance in us-west-2 postgres rds bar.abcdefghijklmnop.us-west-2.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=dev,region=us-west-2,teleport.dev/origin=dynamic
Hint: try addressing the database by its full name or by matching its labels (ex: tsh db connect key1=value1,key2=value2).
Hint: use `tsh db ls -v` or `tsh db ls --format=[yaml | json]` to list all databases with verbose details.
# resolve the error by connecting with an unambiguous prefix
$ tsh db connect --db-user=alice --db-name-postgres bar-rds-us-west-2
#...connects to "bar-rds-us-west-2-123456789012" by prefix...
# or connect by label(s) using --labels
$ tsh db connect --db-user=alice --db-name-postgres --labels region=us-west-2
#...connects to "bar-rds-us-west-2-123456789012" by matching region label...
# or connect by label(s) in a --query predicate
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-2"'
#...connects to "bar-rds-us-west-2-123456789012" by matching region label...
# ambiguous label(s) match is also an error
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-1"'
error: ambiguous database query matches multiple databases:
Name Description Protocol Type URI Allowed Users Labels Connect
------------------------------ ------------------------- -------- ---- ----------------------------------------------------- ------------- ----------------------------------------------------------------------------------------------------------------------------------------- -------
bar-rds-us-west-1-123456789012 RDS instance in us-west-1 postgres rds bar.abcdefghijklmnop.us-west-1.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=dev,region=us-west-1,teleport.dev/origin=dynamic
foo-rds-us-west-1-123456789012 RDS instance in us-west-1 postgres rds foo.abcdefghijklmnop.us-west-1.rds.amazonaws.com:5432 [*] account-id=123456789012,endpoint-type=instance,engine-version=13.10,engine=postgres,env=prod,region=us-west-1,teleport.dev/origin=dynamic
Hint: try addressing the database by its full name or by matching its labels (ex: tsh db connect key1=value1,key2=value2).
Hint: use `tsh db ls -v` or `tsh db ls --format=[yaml | json]` to list all databases with verbose details.
# resolve the error by using either more specific labels or adding a prefix name
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-1"' foo
#...connects to "foo-rds-us-west-1-123456789012" by prefix and label...
$ tsh db connect --db-user=alice --db-name-postgres --query 'labels.region == "us-west-1" && labels.env == "prod"'
#...connects to "foo-rds-us-west-1-123456789012" by multiple labels...
Both the web UI and Teleport Connect already support searching for substrings in resource names and labels.
Searching by substring is a "fuzzier" kind of search than prefix-based name
search (like this RFD proposed prefix-based search for tsh
) - it's more
likely to match more than one resource.
However, GUI UX is fundamentally different from CLI - users can search and then
interactively select from multiple matching resources.
So this kind of search is appropriate for the web UI and Teleport Connect, but
not for tsh
.
Both web UI and Teleport Connect also support label-based searching with the predicate language, e.g.:
labels["env"] == "dev" && labels["region"] == "us-west-1"
Therefore, no UX changes are required for these user interfaces.
No security concerns I can think of.
If the Teleport Discovery service is upgraded, but tsh
is not, then
we may break backwards compatibility with user automation scripts, and/or
frustrate users with long names they must type fully, since their tsh
does
not have the UX improvements.
Solution: backport tsh
UX changes to prior versions and reserve changes to
the Teleport Discovery naming schema for v14.
This way users can continue to type the old names of discovered resources and
connect by prefix match.
tsh
UX changes will add a new predicate expression hasPrefix
to the
server-side predicate resource parser.
If a user has a newer tsh
version than the server, then hasPrefix
may not
be supported by the server and tsh
will get an error.
To avoid issues, we can make tsh
fallback to listing resources without a
predicate expression and filter the results by matching prefix name.
N/A
We should test that discovering multiple resources with identical names does not suffer name collisions.
Setup identically named RDS databases and kube clusters in different AWS regions and a discovery agent to discover them.
Check that the resources in each region are discovered and differentiated by region in their name.