Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trivuele batch 2: RequestStarter simplification and fixes #777

Open
wants to merge 2 commits into
base: next
Choose a base branch
from

Conversation

ArneBab
Copy link
Contributor

@ArneBab ArneBab commented Jun 4, 2022

code sent on FMS, not written by me, reviewed by me; needs input by @toad

@ArneBab ArneBab changed the title Trivuele batch 2 Trivuele batch 2 WIP Jun 4, 2022
@ArneBab ArneBab force-pushed the trivuele-batch-2 branch from a411b7f to ad422a7 Compare June 4, 2022 22:24
@ArneBab ArneBab changed the title Trivuele batch 2 WIP Trivuele batch 2 Jun 4, 2022
The RequestStarter logic was subtly broken in multiple ways:
- Bad interaction between rtt and window estimates would cause your
  node to oscillate between starting way more requests than the
  network can handle and almost stalling entirely for minutes.
  The speed difference may be a factor 100x from one minute to another.
- Nodes patched with the infamous RequestStarter patches would cause
  your node to throttle down on number of requests started to their
  advantage. This could potentially be exploited by an attacker as well.
- The window estimation would attempt to throttle the speed to target
  a certain ratio of requests succeeding without getting dropped due
  to rejected-overload. This would avoid meltdown of the network, but
  the target was not hard-coded but dependant on average network speed
  among other things, in practice ending up allowing 30% of requests
  to fail.
- The window estimation was changed according to an AIMD schedule.
  This would have encouraged nodes on the network to agree on a speed
  collectively, but the logic deviated from AIMD in undocumented ways,
  for example by making the incremental step multiplicative. It is
  unclear if this would actually cause any agreement on the network.
- All code was completely undocumented, had many seemingly arbitrarily
  chosen constants, and multiple lines of code with no effect at all.
- There are other ways to design the estimation of an ideal request
  starting speed that avoids most of the above problems, but it would
  still require selecting some arbitrary constants, so not obviously
  better than replacing all logic altogether with a single constant.

The whole thing can be viewed as a single network constant; the number of
requests per second and peer we can start to cause an ideal load on the
network, assuming distribution of types of requests and idling and busy
nodes looks like it does today. The biggest advantage of this method is
that it causes a very even starting of requests everywhere in the network,
with no oscillating speeding that chokes parts of the space in turns. This
may enable faster speeds with fewer rejected-overloads and peer backoffs.
There are other advantages with this simple method as well. If a future
release does something that improves or worsen rejected-overload ratio or
average request completion time, it will show clearly in the stats. The
network will not try to conceal the change.

The constant was tuned to an empirical value matching todays average
request starting speed. This may need finetuning in future releases if
rejected-overload situation improves and higher speeds are possible.

Made all stats average over whole session, but no persistent. This makes
them more useful to judge state of network.

=== Review note ===

This is not perfect, because at lower speeds peer count is the
logarithm of the speed, so with this change slow peers may be sending
more request than right for their bandwidth.

But the previous logic was broken, because the static increment was
divided by the window size, so it was actually an increase by a
fraction of the window size, which is not how AIMD works.

Merging even though this might need more tuning, because it is an
improvement.

- Arne
@ArneBab ArneBab force-pushed the trivuele-batch-2 branch from ad422a7 to 984ad3f Compare June 4, 2022 22:25
@ArneBab ArneBab changed the title Trivuele batch 2 Trivuele batch 2: RequestSender simplification Jul 2, 2022
@ArneBab ArneBab changed the title Trivuele batch 2: RequestSender simplification Trivuele batch 2: RequestStarter simplification Jul 2, 2022
@ArneBab ArneBab changed the title Trivuele batch 2: RequestStarter simplification Trivuele batch 2: RequestStarter simplification and fixes Jul 2, 2022
@ArneBab
Copy link
Contributor Author

ArneBab commented Jul 16, 2022

@toad can you have a look at this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant