Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detailed consensus data correlated together. #5302

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

mtrippled
Copy link
Collaborator

@mtrippled mtrippled commented Feb 20, 2025

Combine multiple related debug log data points into a single
message. Allows quick correlation of events that
previously were either not logged or, if logged, strewn across multiple lines, making correlation difficult. The Heartbeat Timer and consensus ledger accept processing each have this capability.

Also guarantees that log entries will be written if the node is a validator, regardless of log severity level. Otherwise, the level of these messages is at INFO severity.

High Level Overview of Change

Enable logging of multiple data points so that they are emitted to the debug log at the same time. Validators will have these messages always emitted, regardless
of other configuration of debug log severity level.

Context of Change

The context is to facilitate data collection and troubleshooting. Currently, this much
of the data in the PR is not being logged. And for that which is being logged, it's
written all over the debug log on multiple lines. This makes it difficult to correlate
consensus-related events.

And because this change emits the logs automatically for validators, no special configuration of severity level needs to be made.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Performance (increase or change in throughput and/or latency)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)
  • Documentation update
  • Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • Release

API Impact

  • Public API: New feature (new methods and/or new fields)
  • Public API: Breaking change (in general, breaking changes should only impact the next api_version)
  • libxrpl change (any change that may affect libxrpl or dependents of libxrpl)
  • Peer protocol change (must be backward compatible or bump the peer protocol version)

New, verbose log entries are created. They can be grepped with the string "ConsensusLogger". They are emitted approximately every second, with an additional entry for each consensus "accept" phase. It increases logging requirements. Estimated 5MB/hr.

To see the log messages, grep for:
"ConsensusLogger: Heartbeat Timer" and "ConsensusLogger: onAccept"

If configured as a validator, all debug log severity levels will cause these messages to be emitted (though I'm not sure if kDisabled is over-ridden). To test, configure at warning, error, or fatal, and the log messages should still be emitted. They'll still say they're NFO (info) level, but they bypass the filter. If not configured as a validator, verify that the messages only appear for "info" level or more verbose.

Before committing to this, the community should be informed of the new feature since it will require more disk space, at least for validator operators or people with very terse logging configured. However, the disk requirements are not very large: roughly 5MB per hour. This is < 1GB/week, which is quite small for the benefit.

@mtrippled
Copy link
Collaborator Author

Rebased against a more recent commit.

@mtrippled mtrippled reopened this Feb 20, 2025
Copy link

codecov bot commented Feb 20, 2025

Codecov Report

Attention: Patch coverage is 52.41935% with 118 lines in your changes missing coverage. Please review.

Project coverage is 78.0%. Comparing base (844646d) to head (caedb46).
Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
src/xrpld/consensus/Consensus.h 75.8% 31 Missing ⚠️
src/xrpld/app/consensus/RCLConsensus.cpp 3.4% 28 Missing ⚠️
src/xrpld/app/misc/NetworkOPs.cpp 6.7% 28 Missing ⚠️
src/xrpld/consensus/Consensus.cpp 54.7% 24 Missing ⚠️
include/xrpl/beast/utility/WrappedSink.h 0.0% 2 Missing ⚠️
src/libxrpl/basics/Log.cpp 0.0% 2 Missing ⚠️
src/xrpld/app/consensus/RCLConsensus.h 0.0% 2 Missing ⚠️
src/libxrpl/beast/utility/src/beast_Journal.cpp 0.0% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff            @@
##           develop   #5302     +/-   ##
=========================================
+ Coverage     77.9%   78.0%   +0.2%     
=========================================
  Files          791     790      -1     
  Lines        68006   67856    -150     
  Branches      8346    8237    -109     
=========================================
- Hits         52967   52957     -10     
+ Misses       15039   14899    -140     
Files with missing lines Coverage Δ
include/xrpl/basics/Log.h 71.4% <ø> (ø)
include/xrpl/beast/utility/Journal.h 96.5% <ø> (ø)
src/xrpld/app/main/Application.cpp 69.3% <100.0%> (ø)
src/xrpld/app/misc/NetworkOPs.h 100.0% <ø> (ø)
src/libxrpl/beast/utility/src/beast_Journal.cpp 58.0% <0.0%> (-1.2%) ⬇️
include/xrpl/beast/utility/WrappedSink.h 52.4% <0.0%> (-5.5%) ⬇️
src/libxrpl/basics/Log.cpp 18.4% <0.0%> (-0.2%) ⬇️
src/xrpld/app/consensus/RCLConsensus.h 84.0% <0.0%> (-7.3%) ⬇️
src/xrpld/consensus/Consensus.cpp 74.0% <54.7%> (-24.4%) ⬇️
src/xrpld/app/consensus/RCLConsensus.cpp 62.2% <3.4%> (-3.2%) ⬇️
... and 2 more

... and 7 files with indirect coverage changes

Impacted file tree graph

message. Allows quick correlation of events that
previously were either not logged or, if logged, strewn
across multiple lines, making correlation difficult.
The Heartbeat Timer and consensus ledger accept processing
each have this capability.

Also guarantees that log entries will be written if the
node is a validator, regardless of log severity level.
Otherwise, the level of these messages is at INFO severity.
}

CLOG(clog) << "beginConsensus no previous ledge. ";
Copy link
Collaborator

@vlntb vlntb Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'ledger'

* debug log on destruction. Otherwise, INFO
* level.
*/
class RclConsensusLogger
Copy link
Collaborator

@vlntb vlntb Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean by complex activity a consensus round?
I can see that it would be useful to collect events related to a single ledger. In this case it would make sense to add a ledger id / hash as a property to the instance of the logger. Then we can correlate those complex log entries between different nodes.

@@ -34,38 +34,55 @@ shouldCloseLedger(
std::chrono::milliseconds openTime, // Time waiting to close this ledger
std::chrono::milliseconds idleInterval,
ConsensusParms const& parms,
beast::Journal j)
beast::Journal j,
std::unique_ptr<std::stringstream> const& clog)
Copy link
Collaborator

@vlntb vlntb Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a new parameter to pass clog, should we extend beast::Journal to support streamed logging? Otherwise, we have to pass as a parameter two loggers now. It is our version of beast and we are free to modify it. We might have to switch to shared_ptr since beast::Journal is passed by value, but otherwise, it should be possible.

{
std::string header_;
beast::Journal j_;
std::unique_ptr<std::stringstream> ss_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need std::unique_ptr if the clog is passed by reference everywhere?
I don't see a need for transfer of ownership.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants