Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MEGATHREAD] ECONNECT errors and unresponsive plugin #72

Closed
chase9 opened this issue May 14, 2021 · 40 comments
Closed

[MEGATHREAD] ECONNECT errors and unresponsive plugin #72

chase9 opened this issue May 14, 2021 · 40 comments
Assignees
Labels
bug Something isn't working

Comments

@chase9
Copy link
Member

chase9 commented May 14, 2021

Overview

I'm creating a megathread for us to pool thoughts and spread information. For the past few months, more and more people have been experiencing errors in their logs to the affect of:

Error: GET https://www.alarm.com/web/api/devices/sensors?idsxxxxx failed, reason: read ECONNRESET
    at /homebridge/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:470:15
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async Promise.all (index 1)
    at async Promise.all (index 0)

This error may or may not be accompanied by your accessories becoming unresponsive.

Why is this happening?

I believe what has happened is that Alarm.com has started ratelimiting calls to their APIs. This is completely fair, and likely to prevent DDOS attacks, but it means that our plugin is having trouble due to how "chatty" it is.

What can I do right now?

My testing has led me to believe that they're limiting peoples' IP addresses once a threshold of requests has been reached. This means you can (likely) get around the restriction by doing two things:

  1. Do this first! Increase the values authTimeoutMinutes and pollTimeoutSeconds. I would try setting authTimeout to 60 and pollTimeout to 300. This will reduce how chatty the plugin is while also decreasing how responsive it is to external changes. Doing this may help prevent you from getting banned again.
  2. Restart your homes internet modem. For most people, this will get you a new external IP address so that you're no longer banned.

How can we fix this long-term?

Assuming this is the actual cause of the problem, the fix would be increasing the defaults for the above values. Since this will reduce how responsive the plugin is, we will need to build dynamic code so local device changes are properly reflected. Unfortunately I'm not sure if there's any way around making the plugin less responsive to external changes.

We should also at some point comb through the code to see if it's feasible to reduce the amount of API calls it makes. This may help prevent people from getting banned.

I'm sorry to the people this has inconvenienced.

@chase9 chase9 added the bug Something isn't working label May 14, 2021
@chase9 chase9 self-assigned this May 14, 2021
@chase9 chase9 pinned this issue May 14, 2021
@ifeign
Copy link

ifeign commented May 14, 2021

I reinstalled to give your advice a try, but I have been unable to change my public IP, so I'm now experiencing a new issue: only my panel and garage door are being discovered, they both work though.

@DMBlakeley
Copy link
Contributor

I change the values for authTimeoutMinutes and pollTimeoutSeconds to the recommended values and have let run for the last 2 days. Consistently getting the following error every 5 minutes and I am not able to control Alarm.com:

[5/16/2021, 7:42:14 AM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/2736084 failed: [object Object]
    at /usr/local/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:470:15
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:95:5)
    at async Promise.all (index 0)

Also have same issue as @ifeign where I cannot change my public IP as this seems to be locked to my modem's MAC address.

Question, if I have been blocked by Alarm.com and I cannot change my public IP am I basically out of luck using this plugin?

@chase9
Copy link
Member Author

chase9 commented May 17, 2021

In the latest beta (1.7.2-beta.4) I added some randomness to the device refresh, so whatever value you put will have between 0-5 minutes added. I don't think any one thing will solve this problem, but eventually we'll have enough variance to not get caught.

Eventually your MAC address will switch, and if your mobile app still works we can assume there's a way around this. The issue is that I don't have enough time or energy to troubleshoot this forever...

@ifeign
Copy link

ifeign commented May 17, 2021

For those of us with Qolsys panels, which is probably quite a few, addressing my issue would fix polling of sensors, meaning potentially less server load - you’d just have to poll when something like a garage door or lock is used, vs constantly checking sensor status #71

@anthonyb82
Copy link

Chase, first and foremost, thank you so much for all the work you have done on this plugin. It has been one of my favorites on homebridge but I can’t imagine the headache it’s caused you since this error.

I was wondering, is it possible to strip the plug-in down to just having the ability to set the arm state of the system and not pull in all of the accessories? Having the arm state connected to “good night” scenes, away from home geofencing etc is absolutely awesome. My thought was perhaps it would only need to log in when an arm/disarm command is sent.

@knuckleheadsmiff
Copy link

I change the values for authTimeoutMinutes and pollTimeoutSeconds to the recommended values and have let run for the last 2 days. Consistently getting the following error every 5 minutes and I am not able to control Alarm.com:


[5/16/2021, 7:42:14 AM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/2736084 failed: [object Object]

    at /usr/local/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:470:15

    at runMicrotasks (<anonymous>)

    at processTicksAndRejections (internal/process/task_queues.js:95:5)

    at async Promise.all (index 0)

I too am now seeing this after changing the settings as suggested.

(I only noticed seeing this in my logs when I recently upgraded to a much newer version of nodejs.)

@DMBlakeley
Copy link
Contributor

I will add to the thanks for this plug-in. It worked flawlessly for quite some time which is pretty amazing!

I have Homebridge running on a Mac Mini. What I find really strange is that I can login to alarm.com using Safari on this Mac, however, the alarm-dot-com plugin running on the same Mac fails at login.

@ngori
Copy link
Collaborator

ngori commented May 18, 2021

@chase9 Hi Chase, I installed beta 9 this morning. I was able control a door lock 1 time before getting the following:

[5/18/2021, 10:15:57 AM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/5349741 failed: [object Object]
at /usr/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:464:15
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async Promise.all (index 0)

Plugin and all devices have become unresponsive now. Not sure how but ADC appears to be detecting plugin connections vs app or web interface connections almost immediately.

@DMBlakeley
Copy link
Contributor

Hi Chase, similar behavior as @ngori. Installed beta 9, configured and rebooted. On initial boot, login was successful returning registered panel and devices as well as initial state (motion detected). First sample loop returned the ECONNECT error and devices no longer updated.

I only have alarm system with no additional devices such as lights and garage door. Login through webpage and app occur without issue.

@dkolb
Copy link

dkolb commented May 21, 2021

Hello. I just wanted to drop back by and report on this comment mentioning the Home Assistant plugin.

uvjustin/alarmdotcom and uvjustin/pyalarmdotcomajax also suffer from this issue for me. The difference is, as best I can tell, this integration only supports arming and disarming so it kinda brute forces the operation. That probably explains the lack of complaints on those repositories.

raise client_error(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host www.alarm.com:443 ssl:default [Connection reset by peer]
2021-05-20 22:08:01 WARNING (MainThread) [homeassistant.helpers.entity] Update of alarm_control_panel.alarm_com is taking over 10 seconds
2021-05-20 22:08:07 ERROR (MainThread) [pyalarmdotcomajax.pyalarmdotcomajax] Can not load state data from Alarm.com
2021-05-20 22:08:31 WARNING (MainThread) [homeassistant.helpers.entity] Update of alarm_control_panel.alarm_com is taking over 10 seconds
2021-05-20 22:08:37 ERROR (MainThread) [pyalarmdotcomajax.pyalarmdotcomajax] Can not load state data from Alarm.com

I think we can pretty conclusively say we are triggering some sort of defensive DDoS protection on their side. Anyway, now uninstall HA from my k8s cluster. :-D

@ifeign
Copy link

ifeign commented May 22, 2021

It’s very unlikely to work, but has anyone tried reaching out to the Alarm.com dev team? This guy, for example https://www.linkedin.com/in/akshaybaviskar

@ngori
Copy link
Collaborator

ngori commented May 22, 2021

I believe we had a former ADC dev assisting for a while. To your point it's worth a shot though.

Couple other thoughts/brainstorms:

I thought there was a version of an ADC plugin at one point doing a screen scrape. Very inefficient vs the api's but possibly an option that wouldn't get blocked.

The other thought was routing this plug-in traffic through a vpn service and varying the end points. If the denial is just based on ip (seems like this is possibly given there has been limited success with new WAN ips) this might work. If ADC is rate limiting at the account level it wouldn't help. Not sure how feasible it would be to implement though given you would need a commercial VPN service.

@jfmach
Copy link

jfmach commented May 25, 2021

Hello,

Bit of feedback the other way around: I use 1.7.1, I have 30+ sensors/detectors and while I have a lot of the ECONNRESET errors, generally speaking the system is working and reporting very well.

I have authTimeoutMinutes set to 30 and I didn't set pollTimeoutSeconds so it's set to whatever the default is.

I do have an issue with the arming/disarming of the system but I worked around it by creating dummy switches (homebridge-dummy). Instead of arming/disarming from the homebridge-node-alarm-dot-com alarm accessory, my automations interact with the dummy switches. The dummy switches automatically turn off after 30 seconds and I have additional automations that turn the alarm on (or off) when the dummy switches are turned on AND off.

That works well. Often it doesn't change the alarm state when the dummy switch is turned on, but it does work after 30 seconds when it turns off.

So it seems to me that the problem is that something somewhere needs to be refreshed, and it's refreshed by the initial attempt at changing the status of the alarm.

If I can help troubleshoot this let me know.

Great plugin for me, thanks for all the hard work!

@ifeign
Copy link

ifeign commented May 31, 2021

Has any progress been made on this? Just my luck that I moved into a house that came with an Alarm.com system only for this to happen lol

@Elder-HVAC-Man
Copy link

@jfmach , how do you control the alarm on and off states with dummy switches? That would solve this problem for me. Thanks.

@ifeign
Copy link

ifeign commented Jun 4, 2021

So, I decided to give the plugin another try. As of this writing, my alarm system, lock and garage door work. BUT I don't have any door, window or motion sensors. This isn't the biggest deal, but it's a little strange. I cleared my accessory cache and it didn't change anything. @chase9 any suggestions?

In a way, this is a slight blessing - I am working on setting up Home Assistant and the local Qolsys plugin I've mentioned previously. This would give me realtime sensors, allowing me to exclude everything in the Homebridge plugin except my lock and garage door.

EDIT: Nvm, I was using a different login from my primary account and had it set it "limited device access" which I guess blocks contact sensors. Changing it to "full control" revealed my sensors

@DMBlakeley
Copy link
Contributor

I unloaded the plugin a couple of weeks ago. Retried yesterday and initially thought that the issue had cleared on the Alarm.com end. Within 30 minutes the ECONNECT errors returned and the plugin would not respond. Started looking over the code just for understanding. As webpage and iOS app work just fine wondering if there is a problem in the way the ‘node-fetch’ node_module queries the Alarm.com servers.

@ifeign
Copy link

ifeign commented Jun 13, 2021

I unloaded the plugin a couple of weeks ago. Retried yesterday and initially thought that the issue had cleared on the Alarm.com end. Within 30 minutes the ECONNECT errors returned and the plugin would not respond.

Are you using the beta release? It’s been almost perfect for me. Sure, things are a little slower due to the reduced polling, but it’s been 90% reliable. I haven't seen ECONNECT errors, but have seen other random timeouts that eventually resolved themselves

@DMBlakeley
Copy link
Contributor

Yes, I am using the beta. What do you have your sampling settings at? Default or higher?

@ifeign
Copy link

ifeign commented Jun 13, 2021 via email

@DMBlakeley
Copy link
Contributor

Thanks. I have homebridge running on an M1 Mac Mini. Generated a new Safari mfaCookie for the Mini and reloaded beta-9 of the plug-in with default values. Will see if errors are generated and if so impact on functionality.

@ifeign
Copy link

ifeign commented Jun 13, 2021

Thanks. I have homebridge running on an M1 Mac Mini. Generated a new Safari mfaCookie for the Mini and reloaded beta-9 of the plug-in with default values. Will see if errors are generated and if so impact on functionality.

IDK if it helps, but I made a secondary alarm.com account specifically to use with this plugin and to reduce server activity from my primary account - added bonus is you can revoke that account should you suspect any security breach

@DMBlakeley
Copy link
Contributor

I thought of giving that a try. With loging level at 4, I am getting the following error:

[6/13/2021, 4:35:18 PM] [Security System] Error: GET https://www.alarm.com/web/api/systems/systems/2736084 failed: [object Object] at /usr/local/lib/node_modules/homebridge-node-alarm-dot-com/node_modules/node-alarm-dot-com/dist/index.js:464:15 at runMicrotasks (<anonymous>) at processTicksAndRejections (internal/process/task_queues.js:95:5) at async Promise.all (index 0)

problem or not?

@ifeign
Copy link

ifeign commented Jun 13, 2021

That looks like an error I’ve been getting, that seems to make the plugin unresponsive briefly. It seems to resolve itself though, usually a child bridge restart (or homebridge restart if you haven’t set up child bridges) fixes it if I’m not patient enough.

Side note, make sure your secondary account has full control, at least in my case, contact sensors wouldn’t show up until I did so, even though everything else did

@DMBlakeley
Copy link
Contributor

Set up a secondary account just for homebridge with full control. All sensors are visible.

Do you recommend running plugin in a child bridge?

@ifeign
Copy link

ifeign commented Jun 13, 2021

Set up a secondary account just for homebridge with full control. All sensors are visible.

Do you recommend running plugin in a child bridge?

I like the convenience of child bridges, I can restart one platform without restarting the whole server. You’ll have to re-add all your related accessories if you do move to a child bridge, they also use more RAM, the more bridges you have. Thankfully I’ve got an 8gb Pi 4 because I’ve isolated almost everything lol, but I have had a multitude of child bridges running fine on a Pi 3

@DMBlakeley
Copy link
Contributor

Using the secondary account I am still occassionally getting the same error but interaction with Alarm.com seems to be working. Will run with this configuration for now. Would like to try out child bridges but leave for another day.

@nflute
Copy link

nflute commented Jun 14, 2021

I have no knowledge of coding but hope this will help or give some clue.
I reinstalled v. 1.7.2-beta. 4 (the first beta that supposed 2FA) and I have no error message so far.

@DMBlakeley
Copy link
Contributor

DMBlakeley commented Jun 15, 2021

This is a great observation. I loaded v1.7.2-beta.4 and did not observe error messages. Looking more closely at the overnight log I find a couple occurrences of my previously reported error but no ECONNECT errors. Log is showing that security system is polling every 10 minutes. Much different behavior than beta.9.

@scottleestrange
Copy link

This still happens

@ifeign
Copy link

ifeign commented Jun 15, 2021

On which release? I’m kinda wondering if we’ll see some of the people who rolled back to the earlier beta will get flagged by the server and go back to econnect errors

@DMBlakeley
Copy link
Contributor

I am trying to see if I can start with beta-4 and add in the code changes stepwise to see where the problem occurs.

The first item I found was that the randomized timer was I believe 10x higher than planned resulting in a very long polling interval. Have submitted comment to this code change.

@DMBlakeley
Copy link
Contributor

DMBlakeley commented Jun 15, 2021

I believe I found the issue with beta-9 but want to do more testing to make sure.

UPDATE - I have submitted a pull request with the polling randomizing factor correction. With change beta.9 has been running well. A couple of errors in 24 hours but none that impacter plug-in operation. As the default values are being used the plug-in is also reasonably responsive. The beta.6 and higher also handles caching and restore of sensors in a much better manner than was present in beta.4 which results in much cleaner startup of homebridge.

@scottleestrange
Copy link

scottleestrange commented Jun 16, 2021 via email

@ifeign
Copy link

ifeign commented Jun 16, 2021

Secondary accounts were more of a theory anyway. My main reason for using it is if my homebridge passwords ever leak, I dont leak the info from my primary account.

Have you asked your provider to lift the limitation? Seems kinda odd they’d restrict that in the first place

@chase9
Copy link
Member Author

chase9 commented Aug 18, 2021

Hi all,

The changes from @DMBlakeley have been merged into the latest beta (Beta 11). Could you please update to the latest beta and let me know if you run into any problems? I've been running it for a day now that have seen the plugin successfully recover from an ECONNECT error.

@anthonyb82
Copy link

anthonyb82 commented Aug 18, 2021 via email

@Elder-HVAC-Man
Copy link

How do I go about loading and installing the latest beta (Beta 11) into Homebridge?

@DMBlakeley
Copy link
Contributor

If you are using Homebridge UI for plugin configuration, go to the Plugins page and select the "wrench" symbol and then "Install previous version". You will find beta.11 as an option.

@chase9 chase9 linked a pull request Aug 22, 2021 that will close this issue
@chase9 chase9 removed a link to a pull request Aug 22, 2021
@chase9 chase9 mentioned this issue Aug 22, 2021
@chase9 chase9 closed this as completed Aug 22, 2021
@chase9 chase9 unpinned this issue Aug 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests