Skip to content

Commit

Permalink
Merge pull request #6 from kokonutwh1skey/add-session-expiry
Browse files Browse the repository at this point in the history
Add session expiry
  • Loading branch information
AlexeyAkhunov authored Apr 30, 2023
2 parents 8913e52 + 54e406a commit b9de59b
Show file tree
Hide file tree
Showing 6 changed files with 138 additions and 120 deletions.
59 changes: 29 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ Run the application. This may take a while. Expect to see a TLS Handshake error
./diagnostics --tls.cert demo-tls/diagnostics.crt --tls.key demo-tls/diagnostics-key.pem --tls.cacerts demo-tls/CA-cert.pem
```

To view the application in your browser, go to the URL `https://localhost:8080/ui`. You browser will likely ask to to accept the risks (due to self-signed certificate), do that.
To view the application in your browser, go to the URL `https://localhost:8080/ui`. Your browser will likely ask to accept the risks (due to self-signed certificate), do that.

[Link to more information on this step](#how-to-build-and-run)

Expand All @@ -143,7 +143,7 @@ To run with premade self-signed certificates for TLS (mandatory for HTTP/2), use

# How to access from the browser

In the browser, go to the URL `https://localhost:8080/ui`. You browser will likely ask to to accept the risks (due to self-signed certificate), do that.
In the browser, go to the URL `https://localhost:8080/ui`. Your browser will likely ask to accept the risks (due to self-signed certificate), do that.

# How to run an Erigon node that can be connected to the diagnostics system

Expand All @@ -153,7 +153,7 @@ For an Erigon node to be connected to the diagnostics system, it needs to expose
--metrics
```

By default the metrics are exposed on `localhost` and port `6060`. In order to expose them on a different networking interface and/or different port,
By default, the metrics are exposed on `localhost` and port `6060`. In order to expose them on a different networking interface and/or different port,
the following command line flags can be used:

```
Expand All @@ -179,27 +179,27 @@ If metrics are exposed, textual representation of metrics will be displayed in t

# How to connect Erigon node to the diagnostics system

First, in the browser window, create a new operator session. Choose an arbitraty name. In real operations, one would choose the name
First, in the browser window, create a new operator session. Choose an arbitrary name. In real operations, one would choose the name
that can be easily correlate to the node being supported, for example, name or pseudonym of the person or the company operating the node.

![create new operation session](/images/create_new_session.png)

After new session is created, it will be allocated a unique 8-digit PIN number. The pin is then displayed together with the session number on the screen.
Currently generation of PIN numbers is not secure and always follows the same sequence, which makes testing easier. For example, the first
Currently, generation of PIN numbers is not secure and always follows the same sequence, which makes testing easier. For example, the first
allocated session PIN is always `47779410`.

Next, in a console window, run the following command, specfying the session PIN at the end of the `--diagnostics.url` command line flag.
Since the web site is using self-signed certificate without properly allocated CName, one needs to use `--insecure` flag to be able to connect.
Next, in a console window, run the following command, specifying the session PIN at the end of the `--diagnostics.url` command line flag.
Since the website is using self-signed certificate without properly allocated CName, one needs to use `--insecure` flag to be able to connect.

```
./build/bin/erigon support --metrics.urls http://metrics.addr:metrics.port/debug/metrics --diagnostics.url https://localhost:8080/support/47779410 --insecure
```

# Architecture of diagnostics system

Following diagram shows schematicaly how the process of diagnostics works. Erigon nodes that can be diagnosed, need to be running with `--metrics` flag.
Diagnostics system (HTTP/2 web site) needs to be running somewhere. For the public use, it can be a web site managed by erigon team, for example. For
personal and testing use, this can be locally run web site with self-signed certificates.
Following diagram shows schematically how the process of diagnostics works. Erigon nodes that can be diagnosed, need to be running with `--metrics` flag.
Diagnostics system (HTTP/2 website) needs to be running somewhere. For the public use, it can be a website managed by Erigon team, for example. For
personal and testing use, this can be locally run website with self-signed certificates.

In order to connect Erigon node to the Diagnostics system, user needs to start a process with a command `erigon support`, as described earlier.
The initiations of network connections are shown as solid single arrows. One can see that `erigon support` initiates connections to both Erigon node
Expand All @@ -208,7 +208,7 @@ to make HTTP requests to the Erigon node, and receive the information exposed on
diagnostics system, start with `/support/` prefix, followed by the PIN of the session. In the code inside `cmd/root.go`, this corresponds to the
`BridgeHandler` type.

Operators (those who are trying to assists the Erigon node users) also access Diagnosics system, but in the form of User Interface, built using HTML
Operators (those who are trying to assist the Erigon node users) also access Diagnostics system, but in the form of User Interface, built using HTML
and Javascript. The URLs used for such access, start with `ui/` prefix. In the code inside `cmd/root.go`, this corresponds to the `UiHandler` type.

![diagnostics system architecture](/images/diagnostics.drawio.png)
Expand All @@ -217,7 +217,7 @@ and Javascript. The URLs used for such access, start with `ui/` prefix. In the c

## Code version

Operator can look at the code version that Erigon node has been built with. The corresponding code in erigon is in the file `diagnostics/versions.go`.
Operator can look at the code version that Erigon node has been built with. The corresponding code in Erigon is in the file `diagnostics/versions.go`.
The code on the side of the diagnostics system is spread across files `cmd/ui_handler.go` (invocation of `processVersions` function),
`cmd/versions.go`, `assets/template/session.html` (template in the format of `html/template` package, the part where the button `Fetch Versions` is defined with
the javascript handler), `assets/script/session.js` (function `fetchContent`), `assets/template/versions.html` (html template
Expand All @@ -227,7 +227,7 @@ for the content fetched by the `fetchContent` javascript function and inserted i

## Command line arguments

Operator can look at the command line arguments that were used to launch erigon node. The corresponding code in erigon is in the file `diagnostics/cmd_line.go`.
Operator can look at the command line arguments that were used to launch Erigon node. The corresponding code in Erigon is in the file `diagnostics/cmd_line.go`.
The code on the side of the diagnostics system is spread across files `cmd/ui_handler.go` (invocation of `processCmdLineArgs` function),
`cmd/cmd_line.go`, `assets/template/session.html` (html template, the part where the button `Fetch Cmd Line` is defined with
the javascript handler), `assets/script/session.js` (function `fetchContent`), `assets/template/cmd_line.html` (html template
Expand All @@ -237,9 +237,9 @@ for the content fetched by the `fetchContent` javascript function and inserted i

## Logs

Since version 2.43.0, erigon nodes write logs by default with `INFO` level into `<datadir>/logs` directory, there is log roation. Using diagnosics system,
Since version 2.43.0, Erigon nodes write logs by default with `INFO` level into `<datadir>/logs` directory, there is log rotation. Using diagnostics system,
these logs can be looked at and downloaded to the operator's computer. Viewing the logs is one of the most frequent requests of the operator to the user,
and it makes sense to make this process much more convinient and efficient. The corresponding code in erigon is in the file `diagnostics/log_access.go`.
and it makes sense to make this process much more convenient and efficient. The corresponding code in Erigon is in the file `diagnostics/log_access.go`.
Note that the codes does not give access to any other files in the file system, only to the directory dedicated to the logs.
The code on the side of the diagnostics system is spread across files `cmd/ui_handler.go` (invocation of `processLogPart` and `transmitLogFile` functions),
`cmd/logs.go`, `assets/template/session.html` (html template, the part where the button `Fetch Logs` is defined with
Expand All @@ -258,12 +258,12 @@ the presence of multiple block headers with the same block height but different
One of the ideas for the further development of the diagnostics system is the addition of many more such useful "diagnostics scripts", that could be run against
Erigon's node's database, to check the state of the node, or certain inconsistencies etc.

The corresponding code in erigon is in the file `diagnostics/db_access.go`, and it relies on a feature recently added to the Erigon's code, which is
`mdbx.PathDbMap()`, the global function that retuns the mapping of all currently open MDBX environments (databases), keyed by the paths to their directories in the filesystem.
The corresponding code in Erigon is in the file `diagnostics/db_access.go`, and it relies on a feature recently added to the Erigon's code, which is
`mdbx.PathDbMap()`, the global function that returns the mapping of all currently open MDBX environments (databases), keyed by the paths to their directories in the filesystem.
This allows `db_access.go` to create a read-only transaction for any of these environments (databases) and provide remote reading by the diagnostics system.

The code on the side of the diagnostics system is `cmd/reorgs.go`. The function `findReorgs` generates HTML piece by piece, executing two different html templates
(`assets/template/reorg_spacer.html` and `assets/template/reorg_block.html`). These continously generated HTML lines are picked up by javascript function `findReorgs`
(`assets/template/reorg_spacer.html` and `assets/template/reorg_block.html`). These continuously generated HTML lines are picked up by javascript function `findReorgs`
in file `assets/script/session.js`, which appends them to `innerHTML` field of the div element. This creates an effect of animation, notifying the operator of the
progress of the scanning for reorgs (with spacer html pieces, one for each 1000 blocks), and showing intermediate results of the scan (with block html pieces,
one for each reorged block found).
Expand All @@ -273,45 +273,44 @@ one for each reorged block found).
## Block Body Download

This is the first crude example of monitoring an algorithms involving many items (in that case block bodies) transitioning through the series of states.
On the erigon side, the code is spread across files `dataflow/stages.go`, where the states of each block body in the downloading algorithm are listed,
On the Erigon side, the code is spread across files `dataflow/stages.go`, where the states of each block body in the downloading algorithm are listed,
and the structure `States` is described. This structure allows the body downloader algorithm (in the files `eth/stagedsync/stage_bodies.go` and
`turbo/stages/bodydownload/body_algos.go`) to invoke `AddChange` to report the change of state for any block number. The structure `States` intends to
have a strict upper bound on memory usage and to be very allocation-light. On the other hand, the function `ChangesSince` is called by the code in
`diagnostics/block_body_download.go` to send the recent history of state changes to the diagnostics system (via logical tunner of `erigon support` of course).
`diagnostics/block_body_download.go` to send the recent history of state changes to the diagnostics system (via logical tunnel of `erigon support` of course).
On the side of the diagnostics system, in the file `cmd/bodies_download.go`, there are two functions. One, `bodies_download` is generating output
HTML representing the current view of the some limited number of block bodies being downloaded (1000). This function keeps querying the erigon node
roughly every second and re-generates the HTML (using temlate in `assets/template/body_download.hml`). The re-generated HTML is written to, and is
HTML representing the current view of some limited number of block bodies being downloaded (1000). This function keeps querying the Erigon node
roughly every second and re-generates the HTML (using template in `assets/template/body_download.hml`). The re-generated HTML is written to, and is
consumed by the javascript function `bodiesDownload` in the `assets/script/session.js`, which keeps replacing the `innerHTML` field in a div element
whenever the new HTML piece is available.
Each state is represented by a distinct colour, with the colour legend is also defined in the temlate file.
Each state is represented by a distinct colour, with the colour legend is also defined in the template file.

![body download](/images/body_download.png)

# Ideas for possible improvements

If you are looking at this because you would like to apply to be a part of Erigon development team, the best you can do is to try to first run the
diagnostics system locally as described above, then study the code in the repository and think of a way to improve it. This repository has been
intitially created by a person with very little experience in web server development, web design, javascript, and, more crucially, it has been
initially created by a person with very little experience in web server development, web design, javascript, and, more crucially, it has been
created in a bit of a rush.
Therefore, there should be a lot of things that can be improved in terms of best practices, more pleasant user interface, code simplicity, etc.

There are some functional improvements that could be quite useful, for example:

* Reorg scanner is very basic and it does not have a concept of a "deep" reorg (deeper than 1 block). For such situations, it will just show the consequitive
block numbers as all havign a reorg. It would be better to aggregate these into deep reorgs, and also perhaps show if there are more than 1 branch at each
* Reorg scanner is very basic and it does not have a concept of a "deep" reorg (deeper than 1 block). For such situations, it will just show the consecutive
block numbers as all having a reorg. It would be better to aggregate these into deep reorgs, and also perhaps show if there are more than 1 branch at each
reorg point.
* For the reorg scanner, add the ability to click on the block numbers and get more information about that particular reorg, for example, block producers
for each of the block participating in the reorg, or difference in terms of transactions.
* Any sessions created via User Interface, stay in the server forever and are never cleaned up, so theoretically eventually the server will run out of memory.
This needs to be addressed by introducing some kind of expiration mechanism and cleaning up expired sessions.
* Retrieving command line arguments is only useful if the erigon node is not launched using configuration file. If configutation file is used, then
* Retrieving command line arguments is only useful if the Erigon node is not launched using configuration file. If configuration file is used, then
most of the settings are still not visible to the operator. A possible improvement (which involves also changes in Erigon itself) is to either provide
access to the configutation file, or somehow give access to the "effective" launch settings (i.e. after the configuration file is parsed and applied).
access to the configuration file, or somehow give access to the "effective" launch settings (i.e. after the configuration file is parsed and applied).
* Adding more "diagnostics scripts" that remotely read DB to check for the current progress of stages in the staged sync.
* Adding a monitoring for header downloader as well as for body downloader.
* Perhaps embeeding some metrics visualisation (have no idea how to do it), since all "prometheus"-style metrics are also available to the diagnostics sytem?
* Perhaps embedding some metrics visualisation (have no idea how to do it), since all "prometheus"-style metrics are also available to the diagnostics system?
* Ability to extract and analyse go-routine stack traces from Erigon node. To start with, extract something like `debug/pprof/goroutine?debug=2`, but for Erigon
this would likely result in a lot of go-routines (thousands) with similar traces related to peer management. Some analysis should group them into cluster of similar
stack traces and show them as aggregates.
* Add log rotation system similar to what has recently been done for Erigon (using lumberjack library).

85 changes: 27 additions & 58 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,26 @@ import (
"net/http"
"os"
"os/signal"
"sync"
"syscall"

"github.com/google/btree"
"github.com/ledgerwatch/diagnostics/assets"
lru "github.com/hashicorp/golang-lru/v2"
"github.com/spf13/cobra"
"github.com/spf13/viper"

"github.com/ledgerwatch/diagnostics/assets"
)

var (
// Used for flags.
cfgFile string
listenAddr string
listenPort int
serverKeyFile string
serverCertFile string
caCertFiles []string
insecure bool
cfgFile string
listenAddr string
listenPort int
serverKeyFile string
serverCertFile string
caCertFiles []string
insecure bool
maxNodeSessions int
maxUISessions int

rootCmd = &cobra.Command{
Use: "diagnostics",
Expand Down Expand Up @@ -57,6 +59,8 @@ func init() {
_ = rootCmd.MarkFlagRequired("tls.cert")
rootCmd.Flags().StringSliceVar(&caCertFiles, "tls.cacerts", []string{}, "comma-separated list of paths to and CAs TLS certificates")
rootCmd.Flags().BoolVar(&insecure, "insecure", false, "whether to use insecure PIN generation for testing purposes (default is false)")
rootCmd.Flags().IntVar(&maxNodeSessions, "node.sessions", 5000, "maximum number of node sessions to allow")
rootCmd.Flags().IntVar(&maxUISessions, "ui.sessions", 5000, "maximum number of UI sessions to allow")
}

func initConfig() {
Expand All @@ -83,52 +87,6 @@ func initConfig() {

const successLine = "SUCCESS"

// NodeSession corresponds to one Erigon node connected via "erigon support" bridge to an operator
type NodeSession struct {
lock sync.Mutex
//sessionPin uint64
Connected bool
RemoteAddr string
SupportVersion uint64 // Version of the erigon support command
requestCh chan *NodeRequest // Channel for incoming metrics requests
}

func (ns *NodeSession) connect(remoteAddr string) {
ns.lock.Lock()
defer ns.lock.Unlock()
ns.Connected = true
ns.RemoteAddr = remoteAddr
}

func (ns *NodeSession) disconnect() {
ns.lock.Lock()
defer ns.lock.Unlock()
ns.Connected = false
}

type UiNodeSession struct {
SessionName string
SessionPin uint64
}

type UiSession struct {
lock sync.Mutex
Session bool
SessionPin uint64
SessionName string
Errors []string // Transient field - only filled for the time of template execution
//currentSessionName string
NodeS *NodeSession // Transient field - only filled for the time of template execution
uiNodeTree *btree.BTreeG[UiNodeSession]
UiNodes []UiNodeSession // Transient field - only filled forthe time of template execution
}

func (uiSession *UiSession) appendError(err string) {
uiSession.lock.Lock()
defer uiSession.lock.Unlock()
uiSession.Errors = append(uiSession.Errors, err)
}

func webServer() error {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
Expand All @@ -137,9 +95,20 @@ func webServer() error {
if err != nil {
return fmt.Errorf("parsing session.html template: %v", err)
}

ns, err := lru.NewARC[uint64, *NodeSession](maxNodeSessions)
if err != nil {
return fmt.Errorf("failed to create nodeSessions: %v", err)
}

uis, err := lru.NewARC[string, *UiSession](maxUISessions)
if err != nil {
return fmt.Errorf("failed to create uiSessions: %v", err)
}

uih := &UiHandler{
nodeSessions: map[uint64]*NodeSession{},
uiSessions: map[string]*UiSession{},
nodeSessions: ns,
uiSessions: uis,
uiTemplate: uiTemplate,
}
mux.Handle("/script/", http.FileServer(http.FS(assets.Scripts)))
Expand Down
Loading

0 comments on commit b9de59b

Please sign in to comment.