Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flat keyspace directory backend. #2062

Merged
merged 1 commit into from
Jul 13, 2018
Merged

Flat keyspace directory backend. #2062

merged 1 commit into from
Jul 13, 2018

Conversation

russjones
Copy link
Contributor

Purpose

tsh ssh (and other commands like tsh ls) take a long time on large clusters. For example, on a 4k node cluster with a local etcd instance as the backend and auth, proxy, node all running in the same process, tsh ssh can take over 6 seconds. These numbers are larger for real world deployments where auth, proxy, and node run on different machines as well as etcd.

This is due to two reasons:

  1. During login, tsh calls GetNodes several times. Each time GetNodes is called, it's validates the schema of the services.Server it returns. In the example 4k cluster, this in of itself takes up to 4 seconds. The solution to this is make schema validation optional in calls to GetNodes.

  2. The remaining second is spent by the state cache re-upserting 4k items back into the dir backend. For example, writing 600 bytes to a new file takes about 200 us but writing 600 bytes to an open file takes about 10 us. To reduce the time spent re-upserting 4k items into the dir backend, the dir backend needs to support bulk insertion.

Fixes to these two problems should reduce login time to 500 ms to 1 second for 4k clusters.

Implementation

For the first problem, the API for GetNodes(namespace string) was updated to GetNodes(namespace string, skipValidation bool) allowing the caller to control if schema validation is run or not. In addition, json-iterator was used to speed up marshalling and unmarshalling.

For the second problem, the directory backend was re-written to support a flat keyspace. Once support for a flat keyspace was added, then the directory backend could store buckets with content being a list of keys. Then backend operations would simply be reading in all the keys in a bucket, operating on the map of items in the bucket, and then re-writing it back out. This reduces the time a upsert takes from 200 us to 10 us for each item, which for 4k nodes reduces total time from 800 ms to 40 ms.

Related Issues

Fixes #2061

@russjones russjones requested a review from klizhentas July 11, 2018 01:08
lib/auth/api.go Outdated
@@ -39,7 +39,7 @@ type AccessPoint interface {
GetNamespace(name string) (*services.Namespace, error)

// GetServers returns a list of registered servers
GetNodes(namespace string) ([]services.Server, error)
GetNodes(namespace string, skipValidation bool) ([]services.Server, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make this a functional argument to keep the API consistent

so the signature will look like this:

GetNodes(namespace, SkipValidation())

var out []backend.Item

// Get a list of all buckets in the backend.
files, err := ioutil.ReadDir(path.Join(bk.rootDir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you know here in advance the list of buckets files to open, why do a full scan?

Copy link
Contributor Author

@russjones russjones Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, the actual bucket to open is not known. Take for example a backend with the following items in it:

/roles/admin/params
/roles/dev/params
/namespaces/default/nodes/123

If bucket is /namespaces/default/nodes, then GetItems don't not need to do a full directory scan and can return the exact bucket (or return trace.NotFound if it does not exist).

However if the caller asks for /roles (which is what GetRoles does), that's when the directory scan needs to occur perform a prefix match then extract the first matching bucket and return it as a key.

// If the bucket item is expired, update the bucket, and don't include
// it in the output.
if bk.isExpired(v) {
delete(b.items, k)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer this logic withitemsUpdated and delete operations to be encapsulated in the bucket, so

b.deleteItem(k) should handle this internal property itemsUpdated

// backend and see if any match the prefix. If any match the prefix return
// trace.BadParameter, otherwise return the original error. This is
// consistent with our DynamoDB implementation.
files, er := ioutil.ReadDir(path.Join(bk.rootDir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not do a full scan of the bucket in this case, why not try to open the files with partial prefix instead?

Copy link
Contributor Author

@russjones russjones Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because GetVal has to return trace.BadParameter when specifically requesting a bucket. Take for example a backend with a single item in it:

/foo/bar/baz

If the bucket requested was /foo/bar/baz then we can just check for that particular bucket and return trace.BadParameter if we are unable to open a bucket.

However, if the caller asks for /foo there is no way to differentiate the bucket does not exist at all from the caller requesting a bucket that does exist (when we would return trace.BadParameter.


// Update items in bucket.
for _, e := range newItems {
b.items[e.Key] = bk.itemWithTTL(e.Value, e.TTL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be setItem that updates internal property itemsUpdated


// If the items were updated, write them out to disk.
if b.itemsUpdated {
err = writeOut(b.file, b.items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you failed to write, still close the bucket, otherwise you will get a deadlock situation on busy file system


// readIn will read in the bucket and return a map of keys. The second return
// value returns true to false to indicate if the file was empty or not.
func readIn(f *os.File) (map[string]bucketItem, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readBucket

// writeOut will truncate the file and write out the items to the file f.
func writeOut(f *os.File, items map[string]bucketItem) error {
// Marshal items to disk format.
bytes, err := json.Marshal(items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeBucket

"github.com/ghodss/yaml"
"github.com/json-iterator/go"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dependency is not vendored

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually surprised by this as well, turns out it's already vendored.

@russjones russjones force-pushed the rjones/dir-perf branch 2 times, most recently from 064e1ae to 3a592d2 Compare July 11, 2018 22:53
@kontsevoy
Copy link
Contributor

@russjones does this require a migration?

@klizhentas
Copy link
Contributor

yes, that's why it will go to 2.7

@russjones
Copy link
Contributor Author

russjones commented Jul 13, 2018

@kontsevoy Yes, it's in lib/backend/dir/migrate.go. It's relatively safe but there is a scenario where if the backend directories were poorly chosen it could fail.

all backends to support bulk insertion. Added UpsertNodes endpoint,
which is used by the state cache to speed up GetNodes.
@russjones russjones merged commit 71ffacd into master Jul 13, 2018
@russjones russjones deleted the rjones/dir-perf branch July 13, 2018 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants