Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve feature loading performance by changing UUID generator #2698

Closed
jkronegg opened this issue Feb 24, 2023 · 6 comments · Fixed by #2703
Closed

Improve feature loading performance by changing UUID generator #2698

jkronegg opened this issue Feb 24, 2023 · 6 comments · Fixed by #2703
Assignees

Comments

@jkronegg
Copy link
Contributor

jkronegg commented Feb 24, 2023

👓 What did you see?

The Cucumber Java event bus uses UUID as event identifiers. These UUIDs are generated using the JDK java.util.UUID.randomUUID().
This kind of generators known to be slow (e.g. https://stackoverflow.com/questions/14532976/performance-of-random-uuid-generation-with-java-7-or-java-6). On my project (~450 test scenarios, ~50 rules) this UUID generator takes about 25% of the feature file loading time on the IntelliJ profiler flame graph (~2% of the total CPU time).

This generator is used for each each rule and each scenario step line : that corresponds to ~100'000 calls for my project.

✅ What did you expect to see?

Using a faster UUID generator would improve the feature file loading performance. For example, https://github.com/cowtowncoder/java-uuid-generator provides a time sequence UUID generator which is much faster.

From what I've seen, the solution could be very simple as the EventBus is instanciated in io.cucumber.core.runtime.Runtime.Builder with UUID::randomUUID : we need to change this generator to get a good performance improvement. Note that other calls to UUID.randomUUID() are also made elsewhere in the project but I didn't saw big performance impact.

📦 Which tool/library version are you using?

Cucumber Java 7.11.1

🔬 How could we reproduce it?

Steps to reproduce the behavior:

  1. Run IntelliJ profiler on a project with about 50 rules with cucumber-java 7.11.1.
  2. Look for the loadFeatures block on the flame graph: on the top is the java.util.UUID.randomUUID() call
    uuid_flame_graph.zip

📚 Any additional context?

N/A

@jkronegg
Copy link
Contributor Author

jkronegg commented Mar 2, 2023

The following librairies have been selected as alternatives to generate the UUIDs:

Library Licence Benchmarked by creator Latest commit
JDK java.util.UUID.randomUUID() (OpenJDK 17) ? ? may 2022
https://github.com/cowtowncoder/java-uuid-generator (JUG) Apache License 2.0. yes jan 2023
https://github.com/f4b6a3/uuid-creator MIT License yes oct 2022

As Cucumber has a MIT License project, all three alternatives are allowed from the license point of view.

The following libraries have not been selected as potential replacements:

  • The Apache commons-id library was not selected because it looks unmaintened (latest build in 2006) and has no official release (it's still in sandbox mode).
  • The eaio-uuid library was not selected because it looks unmaintened (latest commit in 2018) and is not documented. Moreover, it generates it's own com.eaoi.uuid.UUID class, not the JVM java.util.UUID, which would make integration harder.

Regarding thread-safety, all selected alternatives are thread-safe:

I benchmarked the UUID librairies with some parameters that are known to provide better performance.

I also tested the generators in realworld conditions on a project with ~450 test scenarios and ~50 rules, I passes a Supplier<UUID> instance which measures the time passed to generate the UUIDs:

public class UuidSuplier implements Supplier<UUID> {
  static AtomicLong time = new AtomicLong(0);

    @Override
    public UUID get() {
        long t0 = System.nanoTime();
        UUID uuid = ... // generate the UUID
        long t1 = System.nanoTime();
        System.out.println(time.addAndGet(t1-t0)); // displays total time in ns
        return uuid;
    }
}

These are the results:

Library generator type JMH benchmark Millions ops/s (higher is better) real-world performance; time to generate all UUIDs [ms] (lower is better)
JUG time-based epoch (Type 7) 1.1 268
JDK random (Type 4) 1.1 189
uuidCreator time-based (Type 1) 10.0 99
uuidCreator time-based epoch (Type 7/2) 34.5 77
JUG time-based (Type 1) too fast to be benchmarked 12
new UUID(Thread.currentThread().getId(), atomicLong.incrementAndGet()) thread-safe counter-based 133 2.5

To summarise, by using the JUG library with time-based Type 1 UUID generator, the gain is about 177 ms on a real project (about 5% of the Cucumber test scenario execution on the project).

@mpkorstanje, @mattwynne : are you ok with the principle of replacing java.util.UUID.randomUUID() by a JUG time-based UUID ? If yes, I'll propose a PR.

@mpkorstanje mpkorstanje transferred this issue from cucumber/cucumber-expressions Mar 2, 2023
@mpkorstanje
Copy link
Contributor

Unfortunately I haven't had time to consider this yet.

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Mar 5, 2023

Unfortunately I've not been able to give this the consideration this is due just yet. Never the less this are my quick consideration:

Based on this I reckon the best course of action would be to

  1. Retain the random UUID generator as the default
  2. Refactor the default UUID generator into a implementation that is loaded via SPI.
  3. Provide a property to select other UUID generators via SPI (note: this should work with cucumber.properties, junit-platform.properties, @CucumberOption, environment variables and the CLI).

For reference look at the current ObjectFactoryServiceLoader implementation.

If you are willing to write a pull request for this, I would be happy to accept it.

@jkronegg
Copy link
Contributor Author

jkronegg commented Mar 6, 2023

I think a time-based UUID will work (there is no collision risk when using multithreading in the same JVM because the generators are have a time-based component and a counter component).

But I agree on the other arguments:

Configuring the UUID generator using SPI is a good solution to delegate the choice to the user. But as the user will need to add specific configuration, it's not guaranteed that many people will use this feature.

@mpkorstanje do you think it is worth doing it now, or is it better to invest development time on #2035 / #2279 ?

@mpkorstanje
Copy link
Contributor

mpkorstanje commented Mar 6, 2023

I think a time-based UUID will work (there is no collision risk when using multithreading in the same JVM because the generators are have a time-based component and a counter component).

Reports may be generated on different JVMs at the same time. A typical example would be one suite that tests against Firefox and another against Safari. The exact browser is configured through a property. These are then executed concurrently on different Gitlab runners.

There are of course solutions for this as well but the advantage of the random UUID is that they'll work in all situations.

But as the user will need to add specific configuration, it's not guaranteed that many people will use this feature.

That is rather unfortunate indeed. Though technically the cost is rather low. So in my estimation it evens out.

@mpkorstanje do you think it is worth doing it now, or is it better to invest development time on #2035 / #2279 ?

As a maintainer I would generally favour the long term approach. But the 5% improvement you mentioned is something that can be achieved in the short term and mostly without my input. So I think it mostly depends on your willingness and interests.

@jkronegg
Copy link
Contributor Author

jkronegg commented Mar 8, 2023

Ok, I'll do it.

@jkronegg jkronegg self-assigned this Mar 23, 2023
mpkorstanje pushed a commit that referenced this issue Apr 6, 2023
The cucumber event bus UUID generator can now be configured through a SPI
using a property from:
- command-line
- environment variables
- system properties
- `cucumber.properties`
- `junit-platform.properties`
- `@CucumberOptions`

Two UUID generators are provided:

An event has a UUID. The UUID generator can be configured using the
`cucumber.uuid-generator` property:

- name: io.cucumber.core.eventbus.RandomUuidGenerator*
  features: Thread-safe, collision-free, multi-jvm
  performance (Millions UUID/second):  ~1
  Typical usage example: Reports may be generated on different JVMs at
     the same time. A typical example would be one suite that tests
     against Firefox and another against Safari. The exact browser is
     configured through a property. These are then executed concurrently
     on different Gitlab runners.
- name: io.cucumber.core.eventbus.IncrementingUuidGenerator  
  features: Thread-safe, collision-free, single-jvm
  performance (Millions UUID/second): ~130
  Typical usage example: Reports are generated on a single JVM                                                                                                                                                                                                                                          |

Performance on real projects depends on the size (e.g. on a project with
3095 Given, 445 When, 1177 Then, 445 Scenarios, 48 Rules, the `IncrementingUuidGenerator`
improves the global performance by about 3-5% and the feature file 
parsing performance of about 37%).

The `IncrementingUuidGenerator` is a very simple UUID generator based 
on `AtomicLong` counters. It is thread-safe and collision-free within
a single JVM.

The `RandomUuidGenerator` is the usual `UUID.randomUUID()` generator.

Fixes: #2698
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants