Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemini: lazy generation of partition keys #200

Merged
merged 17 commits into from
Sep 6, 2019
Merged

Conversation

dahankzter
Copy link
Contributor

@dahankzter dahankzter commented Aug 27, 2019

Lazy partition key generation reintroduced to avoid out of memory issues.

Fixes: #200
Fixes: #199

@dahankzter dahankzter requested a review from penberg August 27, 2019 15:41
@penberg penberg changed the title gemini: lazy generation of partiion keys gemini: lazy generation of partition keys Sep 2, 2019
source.go Outdated
}

// giveOld returns the supplied value for later reuse unless the value
//is empty in which case it removes the corresponding token from the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space needed after the comment marker.

oneStdDev = 0.341
*/
stdDistMean = math.MaxUint64 / 2
oneStdDev = 0.341 * math.MaxUint64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unrelated, and let's not add commented out code to the tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably skip it although I think there is some other code that depends on it. Yes I only left the code there while coding as an easy reminder.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we introduced the TokenIndex type, we can keep this change. But please remove the commented out constants.

@@ -418,8 +428,8 @@ func init() {
rootCmd.Flags().IntVarP(&maxRetriesMutate, "max-mutation-retries", "", 2, "Maximum number of attempts to apply a mutation")
rootCmd.Flags().DurationVarP(&maxRetriesMutateSleep, "max-mutation-retries-backoff", "", 10*time.Millisecond, "Duration between attempts to apply a mutation for example 10ms or 1s")
rootCmd.Flags().Uint64VarP(&pkBufferReuseSize, "partition-key-buffer-reuse-size", "", 2000, "Number of reused buffered partition keys")
rootCmd.Flags().Uint64VarP(&distributionSize, "distribution-size", "", 1000000, "Number of partition keys each worker creates")
rootCmd.Flags().StringVarP(&partitionKeyDistribution, "partition-key-distribution", "", "uniform", "Specify the distribution from which to draw partition keys, supported values are currently uniform|normal|exponential")
rootCmd.Flags().Uint64VarP(&distributionSize, "distribution-size", "", math.MaxUint64, "Number of partition keys each worker creates")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks unrelated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check it.

schema.go Outdated
@@ -756,6 +758,9 @@ func (s *Schema) genSinglePartitionQuery(t *Table, source *Source, r *rand.Rand,
if !ok {
return nil
}
if len(values.Value) != len(t.PartitionKeys) {
fmt.Printf("values=%d, pk=%d\n", len(values.Value), len(t.PartitionKeys))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like left-over debugging printout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

@penberg
Copy link
Contributor

penberg commented Sep 3, 2019

I would like to get rid of the lock sharding commit from this pull request, and defer the optimization. I think a lockless set is probably a cleaner approach, but I would like lazy generation to be merged first.

Henrik Johansson added 5 commits September 3, 2019 16:19
The contract that the gemini.DistributionFunc adheres to has thus far
been underspecified. This commits attempts to formalize what it is
expected to return and why.
for j, table := range schema.Tables {
g := generators[j]
delta := math.MaxUint64 % actors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intent of delta?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's going to be removed. It is left from an intermediate state when I thought the partitions mattered here as well.

Henrik Johansson added 2 commits September 4, 2019 14:04
Hand wiring goroutine termination was messy. Using the package
gopkg.in/tomb.v2 makes it much simpler, safer and nicer.

This commit fixes a deadlock that happened sometimes diring heavy
operations and external shutdown.
@dahankzter dahankzter merged commit fd0b7ca into master Sep 6, 2019
@dahankzter dahankzter deleted the lazy_data_generation branch September 6, 2019 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Command error: fatal error: runtime: cannot allocate memory
2 participants