-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
6.0 Test Plan #5654
Comments
Minor issue with
|
Found a regression in |
Re: |
@r0mant I have OSS trusted clusters, that's what I see after root cluster upgrade (but not leaf) |
@klizhentas Good catch, this looks like it's because the version check that determines the "old proxy" and sets up caches needs to be updated to account for the new version. Submitted a PR: #5709. cc @russjones |
Submitted #5693 which fixes
|
@russjones tested upgrades from 5.1.2 to 6.0 both for OSS and enterprise with and without trusted clusters. I will write migration guide. |
#5766 Needs to be reviewed & approved, merged and backported. |
ETCD (non-iot)1k10KSoak
Notes
|
Scaling up and down with 1K IOT nodes with DynamoDBNotes
|
soak
|
ETCD (500 Trusted Clusters)Note the initial drop in resource consumption around the 21:48 mark. That is the point at which the 500 trusted clusters were taken offline. As you can see, the proxies failed to clean up ~25% of the related goroutines and ~60% of the related heap memory. The second drop around the 21:58 mark is when I manually deleted the In order to determine if the heap memory increase was an ongoing leak or a one-time capacity increase, I cycled the clusters two more times. After the second cycle, resting (post rc deletion) memory increased by another ~10%. After the third cycle, it returned to the approximate amount seen after the first cycle. I interpret this to mean that whatever the problem is, it isn't as simple as an ever-growing set of clusters being held somewhere in memory (plus side, churn is unlikely to cause memory use to grow indefinitely). After the initial cycling, attempting to ssh into root cluster nodes started to result in edit: Looks like this is not a 6.0 regression, but rather an issue that we missed in 5.0. Requires further investigation to discover the root cause, but the errors are definitely being triggered by |
Soak test with Teleport 5.0 and new bench changes
It seems like 6.0 mostly improved in the 99th percentile: 6.0 soak test results |
ETCD (iot)1k10kSoak
Notes
|
Manual Testing Plan
Below are the items that should be manually tested with each release of Teleport.
These tests should be run on both a fresh install of the version to be released
as well as an upgrade of the previous version of Teleport.
Adding nodes to a cluster @russjones
Labels @xacrimon
Trusted Clusters @awly
RBAC @quinqu
Make sure that invalid and valid attempts are reflected in audit log.
Users @fspmarshall @awly
With every user combination, try to login and signup with invalid second factor, invalid password to see how the system reacts.
tsh mfa add
tsh mfa add
tsh mfa ls
tsh mfa rm
tsh mfa rm
second_factor: on
inauth_service
, should failtsh mfa
commands are still hiddensecond_factor: optional
inauth_service
, should succeedtsh mfa add
tsh mfa
commands are still hiddenBackends @andrejtokarcik
Session Recording @benarent
Audit Log @a-palchikov
scp
commands are recordedInteract with a cluster using
tsh
@JoergerThese commands should ideally be tested for recording and non-recording modes as they are implemented in a different ways.
Interact with a cluster using
ssh
@webvictimMake sure to test both recording and regular proxy modes.
Interact with a cluster using the Web UI @russjones
Combinations @xacrimon
For some manual testing, many combinations need to be tested. For example, for
interactive sessions the 12 combinations are below.
Teleport with EKS/GKE @awly
Teleport with multiple Kubernetes clusters @quinqu @awly
Note: you can use GKE or EKS or minikube to run Kubernetes clusters.
Minikube is the only caveat - it's not reachable publicly so don't run a proxy there.
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has your clusterkubectl get nodes
,kubectl exec -it $SOME_POD -- sh
tsh login
, check thattsh kube ls
has both clusterstsh kube login
kubectl get nodes
,kubectl exec -it $SOME_POD -- sh
on the new clustertsh login
, check thattsh kube ls
has all clustersTeleport with FIPS mode @russjones
Migrations @fspmarshall @russjones
SSH should work for both main and old clusters
SSH should work
Command Templates
When interacting with a cluster, the following command templates are useful:
OpenSSH
Teleport
Teleport with SSO Providers @andrejtokarcik @benarent
Teleport Plugins @benarent
WEB UI @alex-kovoy @kimlisa
Main
For main, test with admin role that has access to all resources.
Top Nav
Side Nav
>
, and expand has iconv
Servers aka Nodes
Add Server
button renders dialogue set toAutomatically
viewRegenerate Script
regenerates token value in the bash commandManually
tab renders manual stepsAutomatically
tab renders bash commandApplications
Add Application
button renders dialogueGenerate Script
, bash command is renderedRegenerate
button regenerates token value in bash commandActive Sessions
Audit log
Session Ended
event icon, takes user to session playerdetails
buttonAccess Requests
allow-roles
). This role allows you to see the Role screen and ssh into all nodes.allow-users
). This role session expires in 4 minutes, allows you to see Users screen, and denies access to all nodes.default
)default
assignedallow-roles
andallow-users
are listedallow-roles
allows you to see roles screen and ssh into nodesallow-roles
, verify that assumingallow-users
allows you to see users screen, and denies access to nodes.default
, and requests that are not expired and are approved are assumable again.Users
Auth Connectors
Auth Connectors Card Icons
Roles
Managed Clusters
Help&Support
Access Request Waiting Room
Strategy Reason
Create the following role:
request_prompt
settingsend request
, pending dialogue renderstctl requests approve <request-id>
, dashboard is renderedStrategy Always
With the previous role you created from
Strategy Reason
, changerequest_access
toalways
:tctl requests approve <request-id>
, dashboard is renderedtctl requests deny <request-id>
, access denied dialogue is renderedStrategy Optional
With the previous role you created from
Strategy Reason
, changerequest_access
tooptional
:Account
Terminal
Node List Tab
Session Tab
$ sudo apt-get install mc
$ mc
Session Player
Invite Form
Login Form
RBAC
Create a role, with no
allow.rules
defined:Add Server
button in Server viewAdd Application
button in Applications viewNodes
andApps
are listed underoptions
button inManage Clusters
Add the following under
spec.allow.rules
to enable read access to the audit log:Audit Log
andSession Recordings
is accessibleAdd the following to enable read access to recorded sessions
Add the following to enable read access to the roles
Add the following to enable read access to the auth connectors
Add the following to enable read access to users
Add the following to enable read access to trusted clusters
Verify that a user can access the "Trust" screen
Verify that a user cannot create/delete/update a trusted cluster.
Enterprise users has read/create access_request access, despite resource setting
Performance/Soak Test @fspmarshall @a-palchikov @quinqu
Using
tsh bench
tool, perform the soak tests and benchmark tests on the following configurations:Cluster with 10K nodes in normal (non-IOT) node mode with ETCD
Cluster with 10K nodes in normal (non-IOT) mode with DynamoDB
Cluster with 1K IOT nodes with ETCD
Cluster with 1K IOT nodes with DynamoDB
Cluster with 500 trusted clusters with ETCD
Cluster with 500 trusted clusters with DynamoDB
Soak Tests @fspmarshall @a-palchikov @quinqu
Run 4hour soak test with a mix of interactive/non-interactive sessions:
Observe prometheus metrics for goroutines, open files, RAM, CPU, Timers and make sure there are no leaks
Breaking load tests @fspmarshall @a-palchikov @quinqu
Load system with tsh bench to the capacity and publish maximum numbers of concurrent sessions with interactive
and non interactive tsh bench loads.
Teleport with Cloud Providers
AWS @Joerger @webvictim
GCP @webvictim
IBM @webvictim
Application Access @russjones @r0mant
debug_app: true
works.name.rootProxyPublicAddr
and well aspublicAddr
.name.rootProxyPublicAddr
.app.session.start
andapp.session.chunk
events are created in the Audit Log.app.session.chunk
points to a 5 minute session archive with multipleapp.session.request
events inside.tsh play <chunk-id>
can fetch and print a session chunk archive.tsh play --format=json /path/to/tar
. Seems there's tsh play improvements for Application Access #4943 to support fetching from server.Database Access @r0mant
db.session.start
is emitted when you connect.db.session.end
is emitted when you disconnect.db.session.query
is emitted when you execute a SQL query.tsh db ls
shows only databases matching role'sdb_labels
.db_users
.db_names
.db.session.start
is emitted when connection attempt is denied.The text was updated successfully, but these errors were encountered: