-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OPTIONS pings are sent out all at once #1132
Comments
It's not clear to me the best way of fixing this - one could loop round the user agents and calculate a new random ping_expires value for each one. However then instead of one database update you'd be doing one for each user agent - potentially thousands all at once, which could have a performance hit, but once FreeSWITCH had done that once, then it'll be doing fewer updates each time. Instead one could break up the list gradually - say set the new ping_expires for half the updated user agents by adding the random offset, and half by subtracting it. I'm not sure getting the database to apply the option is viable - the random() function in PostgreSQL and sqlite work differently (I don't think FreeSWITCH has a layer to iron out different database sql syntaxes). |
This issue has started causing a problem as we've increased the number of phones on the system. When 160 phones all do OPTIONS pings at the same time, a handful of pings fail (and are not sent) with a message in the log:
This is a problem if by random chance the same phone fails the options pings a few times in a row, then its registration gets cancelled. I have found a workaround solution though which side-steps the problem mentioned above of any change in the SQL statements in FreeSWITCH need to be compatible between the various databases used. I set the FreeSWITCH options ping time to something long (10 minutes in my case): Then I have a systemctl timer running the following SQL every minute: Doing this means the OPTIONS ping times ignore what FreeSWITCH sets for them because they get overridden by the systemctl timer. |
I came here because I have an endpoint behind a SonicWALL device that I do not control, and UDP timeout is 30 seconds. By default, pings will come out to 20-40 seconds, which exceed the timeout limit for the SonicWALL, despite the fact that I coded 30 seconds for ping interval. The way FS-6400 was done seems backwards to me. Instead of using a random interval each time, only the FIRST packet needs to be randomized. After that, each packet should come at the exact interval specified, but they all won't come at once since the initial starting time is random. I would love to contribute to an easy fix like this but the complete lack of basic documentation of this project and it's underlying functions has me scratching my head of where to start. Grepping the project has proven unhelpful so far. QUESTIONS:
|
@yois615 The function that queries the db to decide whether to send out the pings is this one: sofia_reg_check_ping_expire.
|
@MartinJPaterson - See my edit above, sorry |
@yois615 np. My comment also updated. |
Thanks. For the first question, I know where it's set in XML. How does that XML value appear in the SQL value ping_expires the first time? If we can set the randomization in the initial value of ping_expires when the extension registers , with the max value being ping_mean_interval (which the name is not appropriate after what I'm suggesting), then the result will be randomization that will persist. |
I got it. It's right here in line 954
There's no >0 condition so it will be assigned from the profile. We can add a condition that the random value should only be assigned if ping_expires = 0 |
The SQL used here has to be valid across the several db platforms used as the back-end storage in FS, which precludes using database functions in the sql to generate the random offset, so it has to be done in code (as it is now). The downside is that this function could be updating many registrations at once (1000's potentially) and to set a different value for each ping_expires would require an sql update for each separately, whereas now there's only one select and one update. Even if you do the update only for registrations when ping_expires is 0, that could still be many db queries (say a whole building's worth of phones turn on at the same time after a power cut). Whether this increase in db traffic is actually a problem, I have no idea, but it's certainly a big increase. |
You're saying that this isn't a good idea?
|
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding withing the code will ultimately make the ping times converge over time. This commmit will apply the random interval only during registration of the endpoint. All subsequent pings will be incremented with the actual value of ping-mean-interval. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding withing the code will ultimately make the ping times converge over time. This commmit will apply the random interval only during registration of the endpoint. All subsequent pings will be incremented with the actual value of ping-mean-interval. Fixes signalwire#1132, Fixes signalwire#1133
No. The behaviour of the |
Thanks. I abandoned that idea and have an open PR where the initial randomization is done during registration. That way we don't hit the database any more than the flood of registrations will, and each registration gets its own random time. All subsequent pings go out exactly at the interval. After testing I realized that all registrations during the same second get the same random number. This is because srand() is called during each execution. I have a commit in my repo to address this if you want to take a look. It's linked in #1133. |
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
In FS-6400, the attempt was made to randomize OPTIONS packets to be sent at a random interval. The same random interval is applied to all endpoints so this doesn't work. Furthermore, rounding within the code, as well as reseeding with srand() on each run will ultimately make the ping times converge over time. Once the times converge, they will not separate since the reseeding will cause the same random number to apply to each registration. This commmit will apply the random interval only during initial registration and update of registration. All subsequent pings will be incremented with the actual value of ping-mean-interval. (This parameter name is no longer accurate, and would be better named ping-max-interval). srand() has been moved to the start of the worker thread, and all repeat calls have been removed, so that each call of rand(), even during the same second, generates a different random number. Fixes signalwire#1132, Fixes signalwire#1133
Describe the bug
When you turn on the configuration to send out options pings (all-reg-options-ping and similar) the first set of OPTIONS pings get sent out all at once. There is code to randomise the interval between one OPTIONS ping and the next, unfortunately the same random offset is applied to all the user agents who are due an OPTIONS ping. So from the point of view of a single user agent, the OPTIONS pings arrive at the randomly-varied intervals specified, but FreeSWITCH still sends out all the OPTIONS pings at the same time.
The code (sofia_reg_check_ping_expire() in sofia_reg.c) checks the ping_expires field in the sip_registrations table to identify user agents whose ping is about to expire. It then sends OPTIONS pings to those destinations. Then it chooses the next ping_expires time by adding the configured time +/- a random amount. (This was the subject of FS-6400.) It then updates those registrations with the new ping_expires time with a single sql update, meaning they all get the same value.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The randomisation in the timing of sending OPTIONS pings was added with FS-6400 and the clear intention was to spread out sending of OPTIONS pings over time to reduce load on FreeSWTICH processing the replies and to be nicer to an upstream sbc or proxy.
Package version or git hash
The text was updated successfully, but these errors were encountered: