-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flat keyspace directory backend. #2062
Conversation
lib/auth/api.go
Outdated
@@ -39,7 +39,7 @@ type AccessPoint interface { | |||
GetNamespace(name string) (*services.Namespace, error) | |||
|
|||
// GetServers returns a list of registered servers | |||
GetNodes(namespace string) ([]services.Server, error) | |||
GetNodes(namespace string, skipValidation bool) ([]services.Server, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's make this a functional argument to keep the API consistent
so the signature will look like this:
GetNodes(namespace, SkipValidation())
var out []backend.Item | ||
|
||
// Get a list of all buckets in the backend. | ||
files, err := ioutil.ReadDir(path.Join(bk.rootDir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you know here in advance the list of buckets files to open, why do a full scan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, the actual bucket to open is not known. Take for example a backend with the following items in it:
/roles/admin/params
/roles/dev/params
/namespaces/default/nodes/123
If bucket is /namespaces/default/nodes
, then GetItems don't not need to do a full directory scan and can return the exact bucket (or return trace.NotFound
if it does not exist).
However if the caller asks for /roles
(which is what GetRoles
does), that's when the directory scan needs to occur perform a prefix match then extract the first matching bucket and return it as a key.
lib/backend/dir/impl.go
Outdated
// If the bucket item is expired, update the bucket, and don't include | ||
// it in the output. | ||
if bk.isExpired(v) { | ||
delete(b.items, k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer this logic withitemsUpdated
and delete
operations to be encapsulated in the bucket, so
b.deleteItem(k)
should handle this internal property itemsUpdated
// backend and see if any match the prefix. If any match the prefix return | ||
// trace.BadParameter, otherwise return the original error. This is | ||
// consistent with our DynamoDB implementation. | ||
files, er := ioutil.ReadDir(path.Join(bk.rootDir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not do a full scan of the bucket in this case, why not try to open the files with partial prefix instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because GetVal
has to return trace.BadParameter
when specifically requesting a bucket. Take for example a backend with a single item in it:
/foo/bar/baz
If the bucket requested was /foo/bar/baz
then we can just check for that particular bucket and return trace.BadParameter
if we are unable to open a bucket.
However, if the caller asks for /foo
there is no way to differentiate the bucket does not exist at all from the caller requesting a bucket that does exist (when we would return trace.BadParameter
.
lib/backend/dir/impl.go
Outdated
|
||
// Update items in bucket. | ||
for _, e := range newItems { | ||
b.items[e.Key] = bk.itemWithTTL(e.Value, e.TTL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be setItem
that updates internal property itemsUpdated
lib/backend/dir/impl.go
Outdated
|
||
// If the items were updated, write them out to disk. | ||
if b.itemsUpdated { | ||
err = writeOut(b.file, b.items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you failed to write, still close the bucket, otherwise you will get a deadlock situation on busy file system
lib/backend/dir/impl.go
Outdated
|
||
// readIn will read in the bucket and return a map of keys. The second return | ||
// value returns true to false to indicate if the file was empty or not. | ||
func readIn(f *os.File) (map[string]bucketItem, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readBucket
lib/backend/dir/impl.go
Outdated
// writeOut will truncate the file and write out the items to the file f. | ||
func writeOut(f *os.File, items map[string]bucketItem) error { | ||
// Marshal items to disk format. | ||
bytes, err := json.Marshal(items) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeBucket
"github.com/ghodss/yaml" | ||
"github.com/json-iterator/go" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dependency is not vendored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually surprised by this as well, turns out it's already vendored.
064e1ae
to
3a592d2
Compare
@russjones does this require a migration? |
yes, that's why it will go to 2.7 |
@kontsevoy Yes, it's in |
all backends to support bulk insertion. Added UpsertNodes endpoint, which is used by the state cache to speed up GetNodes.
3a592d2
to
ce1c747
Compare
Purpose
tsh ssh
(and other commands liketsh ls
) take a long time on large clusters. For example, on a 4k node cluster with a local etcd instance as the backend and auth, proxy, node all running in the same process,tsh ssh
can take over 6 seconds. These numbers are larger for real world deployments where auth, proxy, and node run on different machines as well as etcd.This is due to two reasons:
During login,
tsh
callsGetNodes
several times. Each timeGetNodes
is called, it's validates the schema of theservices.Server
it returns. In the example 4k cluster, this in of itself takes up to 4 seconds. The solution to this is make schema validation optional in calls toGetNodes
.The remaining second is spent by the state cache re-upserting 4k items back into the dir backend. For example, writing 600 bytes to a new file takes about 200 us but writing 600 bytes to an open file takes about 10 us. To reduce the time spent re-upserting 4k items into the dir backend, the dir backend needs to support bulk insertion.
Fixes to these two problems should reduce login time to 500 ms to 1 second for 4k clusters.
Implementation
For the first problem, the API for
GetNodes(namespace string)
was updated toGetNodes(namespace string, skipValidation bool)
allowing the caller to control if schema validation is run or not. In addition,json-iterator
was used to speed up marshalling and unmarshalling.For the second problem, the directory backend was re-written to support a flat keyspace. Once support for a flat keyspace was added, then the directory backend could store buckets with content being a list of keys. Then backend operations would simply be reading in all the keys in a bucket, operating on the map of items in the bucket, and then re-writing it back out. This reduces the time a upsert takes from 200 us to 10 us for each item, which for 4k nodes reduces total time from 800 ms to 40 ms.
Related Issues
Fixes #2061