Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event broker performance improvement #407

Merged
merged 1 commit into from
Oct 23, 2019
Merged

Event broker performance improvement #407

merged 1 commit into from
Oct 23, 2019

Conversation

gabriel-marinkovic
Copy link
Contributor

@gabriel-marinkovic gabriel-marinkovic commented Oct 23, 2019

I removed the somewhat arbitrary WithTimeout() call in the emit() function.

As far as I can see, this serves no real purpose. It doesn't matter if event callbacks "expire" after 30 seconds or not, because the callbacks will get canceled regardless:

  1. when the user wants to close a cdp.Page, event broker's context gets canceled, which means all event callbacks get canceled as well.
  2. a cdp-related error occurs, in which case the event callbacks themselves will process the error.

This timeout context served as a "just in case" measure that isn't necessary and is detrimental to performance.

Some sites require you to create a huge amount of elements (1000+ per second). During 15 minutes of scraping, because of this WithTimeout() call, my app allocated over 4.5M contexts which require 4.5M of allocations, mutex acquisitons, and goroutine creations each.

This change improved the performance of my sites by 1-2 orders of magnitude.

@codecov
Copy link

codecov bot commented Oct 23, 2019

Codecov Report

Merging #407 into master will increase coverage by <.1%.
The diff coverage is 100%.

@@           Coverage Diff            @@
##           master    #407     +/-   ##
========================================
+ Coverage    39.3%   39.4%   +<.1%     
========================================
  Files         222     222             
  Lines        9094    9092      -2     
========================================
+ Hits         3577    3580      +3     
+ Misses       5157    5154      -3     
+ Partials      360     358      -2

@ziflex
Copy link
Member

ziflex commented Oct 23, 2019

Hey, thanks for the PR!
This change tracking feature was an early design decision that favors caching over CDP calls. That design had led to this broker.

I wonder if this tracking brings any value and maybe it would be better just have snapshots.

@gabriel-marinkovic
Copy link
Contributor Author

Interesting, could you elaborate a bit? I'd like to contribute to that change. We could possibly move away from this broker entirely, and each function that wants to listen to events should create its own cdp.*EventClient. These objects are very cheap to allocate (especially if element.client.DOM.Enable() has already been called.).

Also, even If we decide to make further changes to the broker, could you merge this change? The timeout context adds nothing but cpu and memory overhead that grows exponentially with the number of elements. I currently have to maintaint a separate fork to circumwent this.

@ziflex ziflex merged commit 874b743 into MontFerret:master Oct 23, 2019
@ziflex
Copy link
Member

ziflex commented Oct 24, 2019

So, the idea behind this broker was to provide a mechanism that would allow to have “live” bindings that would reflect any changes made to a given node in a browser.
But the reality shows that this feature might be unnecessary and even overwhelming.
That’s said, I haven’t made any decisions yet. For now it’s just thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants