Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Architecture revision #3093

Merged
merged 84 commits into from
Oct 22, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
e1c9619
wip: first draft quickstart & structure reorg
one000mph Oct 4, 2019
3d86cf4
minor tweaks and typo fix
one000mph Oct 4, 2019
b6a02f8
reorg into sections, complete quickstart
one000mph Oct 7, 2019
37ccdef
add files & update Integrations header
one000mph Oct 7, 2019
4eeb837
wip: starting tsh documentation
one000mph Oct 9, 2019
1c71585
start Node page, rm redundant section from architecture
one000mph Oct 9, 2019
73eceff
add audit/reply guide
one000mph Oct 10, 2019
90c924e
Merge pull request #4 from andyet/andyet/1st-revision
one000mph Oct 10, 2019
581c6c0
update toc intro links
one000mph Oct 10, 2019
8f22780
separate cli docs to branch
one000mph Oct 10, 2019
992264f
rm trailing whitespace
one000mph Oct 10, 2019
66345a6
tiny changes to concept drafts, links and whitespace formatting
one000mph Oct 10, 2019
56c53b7
complete draft node page
one000mph Oct 10, 2019
c7b7ebf
complete draft proxy page
one000mph Oct 10, 2019
1c84844
revise user chapter
one000mph Oct 10, 2019
ab1ba6a
add user mapping table
one000mph Oct 10, 2019
fe98dcf
updates to auth,basics,nodes pages
one000mph Oct 14, 2019
7371f6b
update TODOs in users
one000mph Oct 14, 2019
a5f3af4
update TODOs in users
one000mph Oct 14, 2019
a46267e
Revert "First draft of Quickstart, Concept Basics & Structure Reorg"
one000mph Oct 15, 2019
f408cd5
Merge pull request #20 from andyet/revert-4-andyet/1st-revision
one000mph Oct 15, 2019
4af8eed
wip: first draft quickstart & structure reorg
one000mph Oct 4, 2019
929a9ad
revert yaml toc
one000mph Oct 15, 2019
dc266aa
revert admin guide
one000mph Oct 15, 2019
e0a2a92
rm cli-docs
one000mph Oct 15, 2019
e61c62f
rm basic-concepts
one000mph Oct 15, 2019
ec46e03
rm all concept docs
one000mph Oct 15, 2019
19a284d
rm config doc
one000mph Oct 15, 2019
199ae4e
rm prod, revised quickstart, and installation
one000mph Oct 15, 2019
2ef6bf6
rm ui dashboard graphic
one000mph Oct 15, 2019
a9480d0
wip: first draft quickstart & structure reorg
one000mph Oct 4, 2019
faaf748
use 4.1 build dir
one000mph Oct 15, 2019
d9aea82
add newline eof
one000mph Oct 15, 2019
e1b7f12
wip: starting tsh documentation
one000mph Oct 9, 2019
45dc730
start Node page, rm redundant section from architecture
one000mph Oct 9, 2019
2c4bcf1
add audit/reply guide
one000mph Oct 10, 2019
eeac06b
update toc intro links
one000mph Oct 10, 2019
46d9529
rm trailing whitespace
one000mph Oct 10, 2019
525c9e9
tiny changes to concept drafts, links and whitespace formatting
one000mph Oct 10, 2019
c2ce00b
complete draft node page
one000mph Oct 10, 2019
d1f862c
complete draft proxy page
one000mph Oct 10, 2019
87636e3
revise user chapter
one000mph Oct 10, 2019
4010f85
add user mapping table
one000mph Oct 10, 2019
80cf59e
updates to auth,basics,nodes pages
one000mph Oct 14, 2019
ba5e0ee
update TODOs in users
one000mph Oct 14, 2019
012a2a1
update TODOs in users
one000mph Oct 14, 2019
0beabd9
wip: first draft quickstart & structure reorg
one000mph Oct 4, 2019
49e9d7c
add newline eof
one000mph Oct 15, 2019
67940fb
checkout missing files from base branch
one000mph Oct 15, 2019
519836d
merge origin
one000mph Oct 15, 2019
a625076
update links to existing docs
one000mph Oct 15, 2019
8f071dd
add auth diagrams
one000mph Oct 16, 2019
bd726a5
format user guide
one000mph Oct 16, 2019
c6ada73
format basics guide
one000mph Oct 16, 2019
8c10b4c
update architecture guide with diagrams, merge with basics guide/
one000mph Oct 16, 2019
4b29128
update architecture guide with diagrams, merge with basics guide/
one000mph Oct 16, 2019
0a59458
update more concepts list
one000mph Oct 16, 2019
5c18680
update auth more concepts list
one000mph Oct 16, 2019
e2aa0ba
add session recording section to node guide
one000mph Oct 16, 2019
8b15012
Merge branch 'andyet/concepts-temp' into andyet/concepts
one000mph Oct 16, 2019
e62c060
format proxy guide
one000mph Oct 16, 2019
e40ca9e
add audit log section to auth guide
one000mph Oct 16, 2019
8e4ec5b
update page title
one000mph Oct 16, 2019
4434fa0
add node diagrams
one000mph Oct 16, 2019
6866bd0
add storage backend section to auth
one000mph Oct 16, 2019
2e4a4cc
update diagrams, prettier with shadows instead of borders
one000mph Oct 16, 2019
0f73b56
user mappings
one000mph Oct 16, 2019
4d654ea
update proxy ssh diagram
one000mph Oct 16, 2019
12e3ed5
rename to architecture section
one000mph Oct 16, 2019
3facf18
update folder name to architecture
one000mph Oct 16, 2019
a7acdcb
fix typos in auth guide
one000mph Oct 17, 2019
735df22
format node guide, fix typpos
one000mph Oct 17, 2019
ecb4461
format overview guide, fix typpos
one000mph Oct 17, 2019
5c9634c
format proxy guide, fix typpos
one000mph Oct 17, 2019
e5abd70
fix typos in user guide
one000mph Oct 17, 2019
efc2cc3
add full name to More Concepts, user guide
one000mph Oct 17, 2019
a70357d
format auth guide
one000mph Oct 17, 2019
af5b3ee
Merge remote-tracking branch 'grav/master'
one000mph Oct 20, 2019
58cc4bd
merge master
one000mph Oct 20, 2019
1841dbf
docs: Fix broken link to backup instructions (#3064)
Pluggi Oct 8, 2019
f3ded79
Removed hardcoded aws access_keys (#3072)
benarent Oct 18, 2019
17731ca
Clarified IAM docs section from the OSS version + added policy exampl…
aelkugia Oct 18, 2019
1f8fd92
Merge remote-tracking branch 'grav/master'
one000mph Oct 22, 2019
7a4e340
resolve conflicts with current master
one000mph Oct 22, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 11 additions & 7 deletions docs/4.1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,10 @@ extra:
pages:
- Documentation:
- Introduction: intro.md
- CLI Docs: cli-docs.md
- Quick Start Guide: guides/quickstart.md
- Architecture: architecture.md
- User Manual: user-manual.md
- Admin Manual: admin-guide.md
- FAQ: faq.md
- Teleport Enterprise:
- Introduction: enterprise.md
- Quick Start Guide: quickstart-enterprise.md
- RBAC: ssh_rbac.md
- Single sign-on (SSO): ssh_sso.md
- Guides:
- AWS: aws_oss_guide.md
- Installation: installation.md
Expand All @@ -46,3 +39,14 @@ pages:
- OIDC: oidc.md
- Trusted Clusters: trustedclusters.md
- Kubernetes Guide: kubernetes_ssh.md
- Architecture:
- Architecture Overview: architecture/overview.md
- Teleport Users: architecture/users.md
- Teleport Nodes: architecture/nodes.md
- Teleport Auth: architecture/auth.md
- Teleport Proxy: architecture/proxy.md
- Enterprise Guides:
- Introduction: enterprise.md
- Quick Start Guide: quickstart-enterprise.md
- RBAC: ssh_rbac.md
- Single sign-on (SSO): ssh_sso.md
290 changes: 290 additions & 0 deletions docs/4.1/architecture/auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,290 @@
# Teleport Auth

This is doc about the Teleport Authentication Service and Certificate
Management. It explains how Users and Nodes are identified and granted access to
Nodes and Services.

[TOC]

## Authentication vs. Authorization

Teleport Auth handles both authentication and authorization. These topics are
related but different and they are often discussed jointly as "Auth".

**Authentication** is proving an identity. "I say I am Bob, and I really am Bob.
See look I have Bob's purple hat.". The job of an Authentication system is to
define the criteria by which users must prove their identity. Is having a purple
hat enough to show that a person is Bob? Maybe, maybe not. To identify users and
nodes to Teleport Auth we require them to present a cryptographically-signed
certificate issued by the Teleport Auth Certificate Authority.

**Authorization** is proving access to something: "Bob has a purple hat, but
also a debit card and the correct PIN code. Bob can access a bank account with
the number 814000001344. Can Bob get $20 out of the ATM?". The ATM's
Authentication system would validate Bob's PIN Code, while the Authorization
system would use a stored mapping from Bob to Account 814000001344 to decide
whether Bob could withdraw cash. Authorization defines and determines
permissions that users have within a system, such as access to cash within a
banking system or data in a filesystem. Before users are granted access to
nodes, the Auth Service checks their identity against a stored mapping in a
database.

![Authentication and Authorization](../img/authn_authz.svg)

## SSH Certificates

One can think of an SSH certificate as a "permit" issued and time-stamped by a
trusted authority. In this case the authority is the Auth Server's Certificate
Authority. A certificate contains four important pieces of data:

1. List of principals (identities) this certificate belongs to.
2. Signature of the certificate authority who issued it.
3. The expiration date, also known as "time-to-live" or simply TTL.
4. Additional data, such as the node role, stored as a certificate extension.

## Authentication in Teleport

Teleport uses SSH certificates to authenticate nodes and users within a cluster.

There are two CAs operating inside the Auth Server because nodes and users each
need their own certificates. <!--TODO: Why?-->

* The **Node CA** issues certificates which identify a node (i.e. host, server,
computer). These certificates are used to add new nodes to a cluster and
identify connections coming from the node.
* The **User CA** issues certificates which identify a User. These certificates
are used to authenticate users when they try to connect to a cluster node.

### Issuing Node Certificates

Node Certificates identify a node within a cluster and establish the permissions
of the node to access to other Teleport services. The presence of a signed
certificate on a node makes it a cluster member.

![Node Joins Cluster](../img/node_join.svg)

1. To join a cluster for the first time, a node must present a "join token" to
the auth server. The token can be static (configured via config file) or a
dynamic, single-use token generated by [`tctl nodes
add`](../cli-docs/#tctl-nodes-add).

!!! tip "Token TTL":
When using dynamic tokens, their default time to live (TTL) is 15
minutes, but it can be reduced (not increased) via
[`tctl nodes add --ttl`](../cli-docs/#tctl-nodes-add) flag.

2. When a new node joins the cluster, the auth server generates a new
public/private keypair for the node and signs its certificate. This node
certificate contains the node's role(s) (`proxy`, `auth` or `node`) as a
certificate extension (opaque signed string).

### Using Node Certificates

![Node Authorization](../img/node_cluster_auth.svg)

All nodes in a cluster can connect to the [Auth Server's API](#auth-api-server)
<!--Docs about this--> implemented as an HTTP REST service running over the SSH
tunnel. This API connection is authenticated with the node certificate and the
encoded role is checked to enforce access control. For example, a client
connection using a certificate with only the `node` role won't be able to add
and delete users. This client connection would only be authorized to get auth
servers registered in the cluster.

### Issuing User Certificates

![Client obtains new certificate](../img/cert_invalid.svg)

The Auth Server uses its User CA to issue user certificates. User certificates
are stored on a user's machine in the `~/.tsh/<proxy_host>` directory or also
by the system's SSH agent if it is running.

1. To get permission to join a cluster for the first time a user must provide
their username, password, and 2nd-factor token. Users can log in with [`tsh
login`](../cli-docs/#tsh-login) or via the Web UI. The Auth Server check
these against its identity storage and checks the 2nd factor token.

2. If the correct credentials were offered, the Auth Server will generate a
signed certificate and return it to the client. For users certificates are
stored in `~/.tsh` by default. If the client uses the [Web
UI](./proxy/#web-ui-to-ssh) the signed certificate is associated with a
secure websocket session.

In addition to user's identity, user certificates also contain user roles and
SSH options, like "permit-agent-forwarding" <!--TODO: link to config/set options
here-->.

This additional data is stored as a certificate extension and is protected by
the CA signature.

### Using User Certificates

![Client offers valid certificate](../img/user_auth.svg)

When a client requests to access a node cluster, the Auth Server first checks
that a certificate exists and hasn't expired. If it has expired, the client must
re-authenticate with their username, password, and 2nd factor. If the
certificate is still valid, the Auth Server validates the certificate's
signature.

If it is correct the client is granted access to the cluster. From here, the
[Proxy Server](./proxy/#connecting-to-a-node) establishes a connection between
client and node.

## Certificate Rotation

By default, all user certificates have an expiration date, also known as time to
live (TTL). This TTL can be configured by a Teleport administrator. But the node
certificates issued by an Auth Server are valid indefinitely by default.

Teleport supports certificate rotation, i.e. the process of invalidating all
previously-issued certificates for nodes _and_ users regardless of their TTL.
Certificate rotation is triggered by [`tctl auth
rotate`](../cli-docs/#tctl-auth). When this command is invoked by a Teleport
administrator on one of cluster's Auth Servers, the following happens:

1. A new certificate authority (CA) key is generated.
2. The old CA will be considered valid _alongside_ the new CA for some period of
time. This period of time is called a _grace period_ <!--TODO: Link to
config/defaults.-->
3. During the grace period, all previously issued certificates will be
considered valid, assuming their TTL isn't expired.
4. After the grace period is over, the certificates issued by the old CA are no
longer accepted.

This process is repeated twice, one for the node CA and once for the user CA.

Take a look at the [Certificate Guide](../admin-guide/#certificate-rotation) to
learn how to do certificate rotation in practice.

## Auth API

<!--TODO: Can we say more about this, abstract of routes provided-->

Clients can also connect to the auth API through the Teleport proxy to use a
limited subset of the API to discover the member nodes of the cluster.

## Auth State

The Auth service maintains state using a database of users, credentials,
certificates, and audit logs. The default storage location is
`/var/lib/teleport` or an [admin-configured storage
destination](../admin-guide/#high-availability).

There are three types of data stored by the auth server:

* **Cluster State** The auth server stores its own keys in a cluster state
storage. All of cluster dynamic configuration is stored there as well,
including:
* Node membership information and online/offline status for each node.
* List of active sessions.
* List of locally stored users
* RBAC configuration (roles and permissions).
* Other dynamic configuration.
* **Audit Log** When users log into a Teleport cluster, execute remote commands
and logout, that activity is recorded in the audit log. See Audit Log for more
details. More on this in the [Audit Log section below](#audit-log).
* **Recorded Sessions** When Teleport users launch remote shells via `tsh ssh`
command, their interactive sessions are recorded and stored by the auth
server. Each recorded session is a file which is saved in /var/lib/teleport by
default, but can also be saved in external storage, like an AWS S3 bucket.

## Audit Log

The Teleport auth server keeps the audit log of SSH-related events that take
place on any node with a Teleport cluster. Each node in a cluster emits audit
events and submit them to the auth server. The events recorded include:

* successful user logins
* node IP addresses
* session time
* session IDs

!!! warning "Compatibility Warning":
Because all SSH events like `exec` or `session_start` are reported by the
Teleport node service, they will not be logged if you are using OpenSSH
`sshd` daemon on your nodes.

Only an SSH server can report what's happening to the Teleport auth server.
The audit log is a JSON file which is by default stored on the auth server's
filesystem under `/var/lib/teleport/log`. The format of the file is documented
in the [Admin Manual](admin-guide/#audit-log).

Teleport users are encouraged to export the events into external, long term
storage.

!!! info "Deployment Considerations":
If multiple Teleport auth servers are used
to service the same cluster (HA mode) a network file system must be used for
`/var/lib/teleport/log` to allow them to combine all audit events into the
same audit log. [Learn how to deploy Teleport in HA Mode.](../admin-guide#high-availability))

## Recording Proxy Mode

In this mode, the proxy terminates (decrypts) the SSH connection using the
certificate supplied by the client via SSH agent forwarding and then establishes
its own SSH connection to the final destination server, effectively becoming an
authorized "man in the middle". This allows the proxy server to forward SSH
session data to the auth server to be recorded, as shown below:

![recording-proxy](../img/recording-proxy.svg?style=grv-image-center-lg)

The recording proxy mode, although _less secure_, was added to allow Teleport
users to enable session recording for OpenSSH's servers running `sshd`, which is
helpful when gradually transitioning large server fleets to Teleport.

We consider the "recording proxy mode" to be less secure for two reasons:

1. It grants additional privileges to the Teleport proxy. In the default mode,
the proxy stores no secrets and cannot "see" the decrypted data. This makes a
proxy less critical to the security of the overall cluster. But if an
attacker gains physical access to a proxy node running in the "recording"
mode, they will be able to see the decrypted traffic and client keys stored
in proxy's process memory.
2. Recording proxy mode requires the SSH agent forwarding. Agent forwarding is
required because without it, a proxy will not be able to establish the 2nd
connection to the destination node.

However, there are advantages of proxy-based session recording too. When
sessions are recorded at the nodes, a root user can add iptables rules to
prevent sessions logs from reaching the Auth Server. With sessions recorded at
the proxy, users with root privileges on nodes have no way of disabling the
audit.

See the [admin guide](../admin-guide#recorded-sessions) to learn how to turn on the
recording proxy mode.

## Storage Back-Ends

Different types of cluster data can be configured with different storage
back-ends as shown in the table below:

Data Type | Supported Back-ends | Notes
-----------------|---------------------------|---------
Cluster state | `dir`, `etcd`, `dynamodb` | Multi-server (HA) configuration is only supported using `etcd` and `dynamodb` back-ends.
Audit Log Events | `dir`, `dynamodb` | If `dynamodb` is used for the audit log events, `s3` back-end **must** be used for the recorded sessions.
Recorded Sessions| `dir`, `s3` | `s3` is mandatory if `dynamodb` is used for the audit log.

!!! tip "Note":
The reason Teleport designers split the audit log events and the recorded
sessions into different back-ends is because of the nature of the data. A
recorded session is a compressed binary stream (blob) while the event is a
well-defined JSON structure. `dir` works well enough for both in small
deployments, but large clusters require specialized data stores: S3 is
perfect for uploading session blobs, while DynamoDB or `etcd` are better
suited to store the cluster state.

The combination of DynamoDB + S3 is especially popular among AWS users because
it allows them to run Teleport clusters completely devoid of local state.

!!! tip "NOTE":
For high availability in production, a Teleport cluster can be
serviced by multiple auth servers running in sync. Check [HA
configuration](admin-guide.md#high-availability) in the Admin Guide.


## More Concepts

* [Architecture Overview](./architecture)
* [Teleport Users](./users)
* [Teleport Nodes](./nodes)
* [Teleport Proxy](./proxy)
Loading