Reread config and restart all services on SIGHUP #1949

roman-khimov · 2021-05-04T19:55:04Z

It's more tricky than #1892, because most of the configuration just can't be changed (all of protocol-level configuration and many application-level things), so we need to carefully read new configuration, check that nothing critical has changed and then shutdown/startup all services as configured. See #1945 also.

roman-khimov · 2022-07-04T16:04:00Z

We've got the following services:

RPC
Has a lot of configuration options including TLS keys/certificates.
Pprof
Can be enabled/disabled and configured for some address/port.
Prometheus metrics
Can be enabled/disabled and configured for some address/port.
Oracle
Has a lot of service-specific options, including a list of other oracle nodes.
Notary
Can be enabled/disabled, configured for some address/port, has a wallet configuration.
State validation
Can be enabled/disabled, configured for some address/port, has a wallet configuration.
Consensus
Unfortunately, it's an implicit service in configuration at the moment (in that it doesn't have a section of its own). It only has a wallet configuration (which enables it at the same time).

In general we might want to do something with them either because the configuration has changed or because we suspect that some runtime operational problem can be fixed by dropping the service state. The behavior at the moment is to reread the config and restart RPC service on SIGHUP. Originally we wanted to restart all of the other services as well, but:

restarting RPC can affect its clients. They must handle this appropriately, but still we may want to avoid restarting it in some cases
restarting all of the services at once doesn't differ much from a complete node restart
restarting the node shouldn't be a problem in most of the cases (so changing nothing at all is an option too)
most of the services don't have a lot of things to change in their configuration sections, so they're not likely to require restarts
consensus restart is dangerous in general, so it should be handled separately from other services (it's also the one where restarts are not likely to be driven by configuration changes)
theoretically there are other things that can be controlled externally in a node (flushing transaction/notary pools, dropping P2P connections)
we've only got HUP, USR1 and USR2 if we're to follow our current control model which is not a lot

What I'd probably do is:

reread config and restart RPC, Pprof, Prometheus (client-oriented services) on HUP
reread config and restart Oracle/Notary/SV (network-oriented services) on USR1
reread config and restart Consensus (special one) on USR2

But this uses all of the "normal" signals and maybe is an overkill. And there is always an option of full node restart which can solve any configuration/operation issue.

@anatoly-bogatyrev, @realloc, thoughts, opinions?

realloc · 2022-07-18T15:15:24Z

From the current field usage experience, we need a way to restart Consensus without disturbing RPC and WS connections giving first priority to reads/testinvoke calls in the following cases:

Stuck consensus service
Increase number of CNs
CN replacement

We also need a way to restart RPC without disturbing other services on TLS certificates update.

The proposed scheme with HUP/USR1/USR2 signals is good for a start.

In future it would be also nice to have this wrapped into neo-go CLI. Like neo-go service <svc> [restart|status] or have the full control API like we do in neofs-node.

roman-khimov · 2022-07-20T15:48:17Z

Like neo-go service [restart|status] or have the full control API like we do in neofs-node.

That requires either some RPC extensions (with all of the associated access restriction problems), or a separate control channel (like Unix socket) which is something that we'd like to avoid as much as possible.

Fix #1949.

Fix #1949. Also drop wallet from the ServerConfig since it's not used in any meaningful way after this change.

roman-khimov modified the milestones: v0.96.1, v0.96.2 Jul 20, 2021

roman-khimov modified the milestones: v0.97.0, v0.97.1, v0.97.2 Aug 2, 2021

roman-khimov modified the milestones: v0.97.2, v0.97.3 Aug 18, 2021

roman-khimov modified the milestones: v0.97.3, v0.97.4 Sep 13, 2021

roman-khimov modified the milestones: v0.98.0, v0.98.1 Nov 12, 2021

roman-khimov modified the milestones: v0.98.1, v0.98.2 Jan 17, 2022

AnnaShaleva mentioned this issue Feb 8, 2022

CLI: configuration reload on SIGHUP is not supported on Windows #2352

Open

roman-khimov modified the milestones: v0.98.2, v0.98.3 Feb 18, 2022

roman-khimov modified the milestones: v0.99.0, v0.99.1 Apr 7, 2022

roman-khimov mentioned this issue Apr 22, 2022

Add StartWhenSynchronized option for RPC server #2445

Merged

roman-khimov modified the milestones: v0.99.1, v0.99.2, v0.99.3 Jun 9, 2022

AnnaShaleva mentioned this issue Jun 28, 2022

Review and refactor services start and shutdown processes #2568

Closed

roman-khimov modified the milestones: v0.99.3, v0.99.2 Jul 11, 2022

roman-khimov added a commit that referenced this issue Jul 27, 2022

cli: restart consensus service on USR2

abeb3e8

Fix #1949.

roman-khimov mentioned this issue Jul 27, 2022

Fancy service restart #2612

Merged

roman-khimov modified the milestones: v0.99.2, v0.99.1 Jul 27, 2022

roman-khimov added a commit that referenced this issue Jul 27, 2022

cli: restart consensus service on USR2

7ec1989

Fix #1949.

roman-khimov modified the milestones: v0.99.1, v0.99.2 Jul 28, 2022

roman-khimov added a commit that referenced this issue Jul 28, 2022

cli: restart consensus service on USR2

00f4151

Fix #1949.

roman-khimov added a commit that referenced this issue Aug 2, 2022

cli: restart consensus service on USR2

5a7fa2d

Fix #1949. Also drop wallet from the ServerConfig since it's not used in any meaningful way after this change.

roman-khimov closed this as completed in #2612 Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reread config and restart all services on SIGHUP #1949

Reread config and restart all services on SIGHUP #1949

roman-khimov commented May 4, 2021

roman-khimov commented Jul 4, 2022

realloc commented Jul 18, 2022

roman-khimov commented Jul 20, 2022

Reread config and restart all services on SIGHUP #1949

Reread config and restart all services on SIGHUP #1949

Comments

roman-khimov commented May 4, 2021

roman-khimov commented Jul 4, 2022

realloc commented Jul 18, 2022

roman-khimov commented Jul 20, 2022