-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime error: integer divide by zero #186
Comments
On vacation right now but I'll have a look later. That looks like an edge case I've missed. |
No worries, thanks for the heads-up. Enjoy the remaining of your vacation! |
You've indeed found a bug and it is in code for which I don't even have a test... How embarrassing. What happens is that the weighing algorithm which pre-computes the distribution of the target URLs based on weight allocates only 100 slots (= 100%) and when you create more than a couple hundred instances the rounding errors ensure that no instance is used. This then trips the random number function which never thought it would be called with zero. Why would anybody create more than 100 instances anyway? ;) Time to refactor. Not a biggie but I need to add a test first. For now stay <= 100 instances per prefix. I've got an ugly patch which you can apply if this is an urgent production issue. Otherwise, a fix is coming soon. Thanks for finding this. |
Thanks for diving into this, no ugly patching needed. :) This was found during testing 100+ nodes. But don't hesitate to ping me if I can help testing. |
Thanks for posting this bug :) I'm going to be rolling out a single urlprefix to more than 100 nodes, relatively soon (production). |
@magiconair Any idea when this might be resolved? I know you've been all over the place :) We will be rolling out to a couple hundred nodes sometime in Jan/Feb timeframe. Cheers |
One of the next things on my list.
|
@killcity I'll have a look today whether there is a quick workaround. My guess is that changing |
@killcity Unfortunately, the really quick fix doesn't work. On the upside, I've found that I actually do have a test for the code. Just in a place where I didn't expect it :) I'll keep looking. |
@magiconair Thanks! |
When fabio is presented with more than 200 targets for a single route then a rounding error had the effect that no target would get traffic which subsequently leads to a division by zero error in a code path which does not expect an empty list of targets. This change modifies the distribution algorithm as follows: * The case where all targets receive an equal amount of traffic has now a fast path which just returns the list of targets. * For the other case a larger ring of 10.000 slots is always used to achieve a certain amount of accuracy. In addition, all targets that have a non-zero weight will use at least one slot to work around rounding issues and the fact that some servers would not receive traffic although they are alive and healthy. This should guarantee an unlimited number of instances for a single route.
When fabio is presented with more than 200 targets for a single route then a rounding error had the effect that no target would get traffic which subsequently lead to a division by zero error in a code path which did not expect an empty list of targets. This change modifies the distribution algorithm as follows: * The case where all targets receive an equal amount of traffic has now a fast path which just returns the list of targets. * For the other case a larger ring of 10.000 slots is always used to achieve a certain amount of accuracy. In addition, all targets that have a non-zero weight will use at least one slot to work around rounding issues and the fact that some servers would not receive traffic although they are alive and healthy. This should guarantee an unlimited number of instances for a single route.
When fabio is presented with more than 200 targets for a single route then a rounding error had the effect that no target would get traffic which subsequently lead to a division by zero error in a code path which did not expect an empty list of targets. This change modifies the distribution algorithm as follows: * The case where all targets receive an equal amount of traffic has now a fast path which just returns the list of targets. * For the other case a larger ring of 10.000 slots is always used to achieve a certain amount of accuracy. In addition, all targets that have a non-zero weight will use at least one slot to work around rounding issues and the fact that some servers would not receive traffic although they are alive and healthy. This should guarantee an unlimited number of instances for a single route.
@killcity I've pushed a change which should fix the issue and which also adds a fast-path for the default case of all targets receiving equal amounts of traffic. The new code guarantees that a target with a computed |
tests fail on travis but work locally. I'll have a look tomorrow. |
When fabio is presented with more than 200 targets for a single route then a rounding error had the effect that no target would get traffic which subsequently lead to a division by zero error in a code path which did not expect an empty list of targets. This change modifies the distribution algorithm as follows: * The case where all targets receive an equal amount of traffic has now a fast path which just returns the list of targets. * For the other case a larger ring of 10.000 slots is always used to achieve a certain amount of accuracy. In addition, all targets that have a non-zero weight will use at least one slot to work around rounding issues and the fact that some servers would not receive traffic although they are alive and healthy. This should guarantee an unlimited number of instances for a single route.
Tests are timing out. Parsing 2500 routes is somewhat slow. Need to check whether the new parser will be faster. |
When fabio is presented with more than 200 targets for a single route then a rounding error had the effect that no target would get traffic which subsequently lead to a division by zero error in a code path which did not expect an empty list of targets. This change modifies the distribution algorithm as follows: * The case where all targets receive an equal amount of traffic has now a fast path which just returns the list of targets. * For the other case a larger ring of 10.000 slots is always used to achieve a certain amount of accuracy. In addition, all targets that have a non-zero weight will use at least one slot to work around rounding issues and the fact that some servers would not receive traffic although they are alive and healthy. This should guarantee an unlimited number of instances for a single route.
When fabio is presented with more than 200 targets for a single route then a rounding error had the effect that no target would get traffic which subsequently lead to a division by zero error in a code path which did not expect an empty list of targets. This change modifies the distribution algorithm as follows: * The case where all targets receive an equal amount of traffic has now a fast path which just returns the list of targets. * For the other case a larger ring of 10.000 slots is always used to achieve a certain amount of accuracy. In addition, all targets that have a non-zero weight will use at least one slot to work around rounding issues and the fact that some servers would not receive traffic although they are alive and healthy. This should guarantee an unlimited number of instances for a single route.
Test added a nice stress test for |
When fabio is presented with more than 200 targets for a single route then a rounding error had the effect that no target would get traffic which subsequently lead to a division by zero error in a code path which did not expect an empty list of targets. This change modifies the distribution algorithm as follows: * The case where all targets receive an equal amount of traffic has now a fast path which just returns the list of targets. * For the other case a larger ring of 10.000 slots is always used to achieve a certain amount of accuracy. In addition, all targets that have a non-zero weight will use at least one slot to work around rounding issues and the fact that some servers would not receive traffic although they are alive and healthy. This should guarantee an unlimited number of instances for a single route.
Merged to master |
Hi,
I've run into a bug. When running a service consisting of around 100 containers this error does not occur, but scaling up to several 100's it does. Other services with different urlprefix still continue to work.
The text was updated successfully, but these errors were encountered: