Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fancy service restart #2612

Merged
merged 14 commits into from
Aug 2, 2022
Merged

Fancy service restart #2612

merged 14 commits into from
Aug 2, 2022

Conversation

roman-khimov
Copy link
Member

This fixes #1949. It works like this:

No config changes:

2022-07-27T11:32:46.376+0300    INFO    signal received {"name": "hangup"}
2022-07-27T11:32:46.378+0300    INFO    shutting down RPC server        {"endpoint": "[::]:10332"}
2022-07-27T11:32:46.379+0300    INFO    starting rpc-server     {"endpoint": ":10332"}
2022-07-27T11:32:46.379+0300    INFO    shutting down service   {"service": "Prometheus", "endpoint": ":2112"}
2022-07-27T11:32:46.380+0300    INFO    service hasn't started since it's disabled      {"service": "Pprof"}
2022-07-27T11:32:46.381+0300    INFO    service is running      {"service": "Prometheus", "endpoint": ":2112"}

Enabled/disabled services:

2022-07-27T11:33:18.936+0300    INFO    signal received {"name": "hangup"}
2022-07-27T11:33:18.938+0300    INFO    shutting down RPC server        {"endpoint": "[::]:10332"}
2022-07-27T11:33:18.940+0300    INFO    RPC server is not enabled
2022-07-27T11:33:18.940+0300    INFO    shutting down service   {"service": "Prometheus", "endpoint": ":2112"}
2022-07-27T11:33:18.940+0300    INFO    service is running      {"service": "Pprof", "endpoint": ":2113"}
2022-07-27T11:33:18.941+0300    INFO    service is running      {"service": "Prometheus", "endpoint": ":2112"}

2022-07-27T11:36:39.794+0300    INFO    signal received {"name": "user defined signal 1"}
2022-07-27T11:36:45.840+0300    INFO    starting oracle service
2022-07-27T11:36:45.840+0300    INFO    stopping state validation service
2022-07-27T11:36:50.678+0300    INFO    persisted to disk       {"blocks": 1, "keys": 261, "headerHeight": 1926393, "blockHeight": 1926393, "took": "95.264089ms"}
2022-07-27T11:36:51.232+0300    INFO    starting state validation service

2022-07-27T11:46:26.781+0300    INFO    signal received {"name": "user defined signal 2"}
2022-07-27T11:46:27.109+0300    INFO    starting consensus service
2022-07-27T11:46:27.109+0300    INFO    initializing dbft       {"height": 1926428, "view": 0, "index": -1, "role": "WatchOnly"}
2022-07-27T11:46:33.106+0300    INFO    received PrepareRequest {"validator": 0, "tx": 0}
2022-07-27T11:46:33.158+0300    INFO    received PrepareResponse        {"validator": 2}
2022-07-27T11:46:33.179+0300    INFO    received PrepareResponse        {"validator": 4}
2022-07-27T11:46:33.203+0300    INFO    received PrepareResponse        {"validator": 1}
2022-07-27T11:46:33.222+0300    INFO    received PrepareResponse        {"validator": 6}
2022-07-27T11:46:33.230+0300    INFO    received PrepareResponse        {"validator": 3}
2022-07-27T11:46:33.250+0300    INFO    received Commit {"validator": 4}
2022-07-27T11:46:33.334+0300    INFO    received Commit {"validator": 2}
2022-07-27T11:46:33.342+0300    INFO    received Commit {"validator": 3}
2022-07-27T11:46:33.363+0300    INFO    received Commit {"validator": 6}
2022-07-27T11:46:34.835+0300    INFO    received PrepareResponse        {"validator": 5}
2022-07-27T11:46:34.929+0300    INFO    received Commit {"validator": 5}
2022-07-27T11:46:34.929+0300    INFO    approving block {"height": 1926428, "hash": "56e51d1399f7e784b697e51623ee0f16e0b0c21c46da7d2a9d01025758f2e9ce", "tx_count": 0, "merkle": "0000000000000000000000000000000000000000000000000000000000000000", "prev": "4ce420b948da9de482d9bb12dd86b54387c53bad013bfc0739b3d54f6ffe0791"}
2022-07-27T11:46:34.932+0300    INFO    initializing dbft       {"height": 1926429, "view": 0, "index": -1, "role": "WatchOnly"}

2022-07-27T11:47:08.517+0300    INFO    signal received {"name": "user defined signal 2"}
2022-07-27T11:47:08.518+0300    INFO    stopping consensus service
2022-07-27T11:47:08.662+0300    INFO    persisted to disk       {"blocks": 1, "keys": 23, "headerHeight": 1926430, "blockHeight": 1926430, "took": "9.627537ms"}
2022-07-27T11:47:08.871+0300    INFO    starting consensus service
2022-07-27T11:47:08.871+0300    INFO    initializing dbft       {"height": 1926431, "view": 0, "index": -1, "role": "WatchOnly"}

ProtocolConfiguration must remain the same, any errors mean that the signal
will be ignored.
Most of the settings can't be changed, only services can.
Also fix addresses if needed and store this new configuration.
@roman-khimov roman-khimov force-pushed the fancy-service-restart branch from 8b97407 to 09ffed8 Compare July 27, 2022 09:30
@codecov
Copy link

codecov bot commented Jul 27, 2022

Codecov Report

Merging #2612 (9b0ea2c) into master (1ff588a) will decrease coverage by 0.16%.
The diff coverage is 60.20%.

@@            Coverage Diff             @@
##           master    #2612      +/-   ##
==========================================
- Coverage   84.51%   84.35%   -0.17%     
==========================================
  Files         299      300       +1     
  Lines       37882    38073     +191     
==========================================
+ Hits        32016    32115      +99     
- Misses       4458     4547      +89     
- Partials     1408     1411       +3     
Impacted Files Coverage Δ
pkg/network/payload/extensible.go 100.00% <ø> (ø)
pkg/network/server_config.go 100.00% <ø> (ø)
cli/server/server.go 70.14% <18.75%> (-10.24%) ⬇️
pkg/core/blockchain.go 81.86% <50.00%> (-0.50%) ⬇️
pkg/services/oracle/request.go 57.99% <50.00%> (ø)
pkg/network/server.go 73.17% <58.13%> (-0.73%) ⬇️
pkg/services/oracle/broadcaster/oracle.go 48.27% <66.66%> (ø)
pkg/services/rpcsrv/server.go 76.96% <72.72%> (-0.20%) ⬇️
pkg/services/metrics/metrics.go 68.42% <75.00%> (+5.92%) ⬆️
pkg/core/native/oracle.go 73.87% <82.35%> (+1.68%) ⬆️
... and 18 more

Help us with your feedback. Take ten seconds to tell us how you rate us.

@roman-khimov roman-khimov force-pushed the fancy-service-restart branch from c5dc439 to 6a85a63 Compare July 27, 2022 10:00
Comment on lines +537 to +540
orc, _ := o.Module.Load().(*OracleService)
if orc == nil || *orc == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a race condition, where all oracle services are restarted simultaneously and some requests are skipped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. The only way to solve it is to fetch currently active requests on service start, I'll take a look at it.

Copy link
Member Author

@roman-khimov roman-khimov Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same problem for all currently active requests before node start now, they'll be completely ignored.

Or not exactly, it's way more involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@roman-khimov
Copy link
Member Author

OK, this can wait for 0.98.2, too many things touched and too many open questions.

The only thing rpcsrv needs is AddResponse callback.
Which allows to enable/disable the service, change nodes, keys and other
settings. Unfortunately, atomic.Value doesn't allow Store(nil), so we have to
store a pointer there that can point to nil interface.
It's a bit special since it's _always_ present to catch stateroots from the
network.
Fix #1949. Also drop wallet from the ServerConfig since it's not used in any
meaningful way after this change.
Now that services can come and go we need to protect all of the associated
fields and allow to deregister them.
It has a stub for SIGHUP, but doesn't have anything for USR1 and USR2:

Error: cli\server\server.go:520:31: undefined: syscall.SIGUSR1
Error: cli\server\server.go:521:31: undefined: syscall.SIGUSR2
Error: cli\server\server.go:565:17: undefined: syscall.SIGUSR1
Error: cli\server\server.go:608:17: undefined: syscall.SIGUSR2
Move category definition from consensus to payload, consensus service is the
one of its kind (HP), so network.Server can be adjusted accordingly.
@roman-khimov roman-khimov merged commit cfd2a35 into master Aug 2, 2022
@roman-khimov roman-khimov deleted the fancy-service-restart branch August 2, 2022 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reread config and restart all services on SIGHUP
3 participants