Fix sharded federation sender sometimes using 100% CPU.

We pull all destinations requiring catchup from the DB in batches. However, if all those destinations get filtered out (due to the federation sender being sharded), then the `last_processed` destination doesn't get updated, and we keep requesting the same set repeatedly.
matrix-org · Apr 8, 2021 · 3a569fb · 3a569fb
1 parent 48d44ab
commit 3a569fb
Show file tree

Hide file tree

Showing 2 changed files with 5 additions and 2 deletions.
diff --git a/changelog.d/9770.bugfix b/changelog.d/9770.bugfix
@@ -0,0 +1 @@
+Fix bug where sharded federation senders could get stuck repeatedly querying the DB in a loop, using lots of CPU.
diff --git a/synapse/federation/sender/__init__.py b/synapse/federation/sender/__init__.py
@@ -734,16 +734,18 @@ async def _wake_destinations_needing_catchup(self) -> None:
                 self._catchup_after_startup_timer = None
                 break
 
+            last_processed = destinations_to_wake[-1]
+
             destinations_to_wake = [
                 d
                 for d in destinations_to_wake
                 if self._federation_shard_config.should_handle(self._instance_name, d)
             ]
 
-            for last_processed in destinations_to_wake:
+            for destination in destinations_to_wake:
                 logger.info(
                     "Destination %s has outstanding catch-up, waking up.",
                     last_processed,
                 )
-                self.wake_destination(last_processed)
+                self.wake_destination(destination)
                 await self.clock.sleep(CATCH_UP_STARTUP_INTERVAL_SEC)
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Fix bug where sharded federation senders could get stuck repeatedly querying the DB in a loop, using lots of CPU.