Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sync stages diagnostic script #13

Merged
merged 16 commits into from
May 15, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/go-checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,7 @@ jobs:
with:
version: latest
skip-pkg-cache: true
- name : Test
run: go test -v ./...
- name: Build
run: go build -v ./...
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,6 @@
# vendor/

diagnostics

# IDE configuration
.idea
15 changes: 14 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
- [Command line arguments](#command-line-arguments)
- [Logs](#logs)
- [Reorg scanner](#reorg-scanner)
- [Sync stages](#sync-stages)
- [Block body download](#block-body-download)
- [Ideas for Possible improvements](#ideas-for-possible-improvements)

Expand Down Expand Up @@ -238,7 +239,7 @@ for the content fetched by the `fetchContent` javascript function and inserted i
## Flags
Operator can look at the flags that are set in cli context by the user to launch Erigon node. The corresponding code in Erigon is in the file `diagnostics/flags.go`. This is particularly useful when user launches Erigon using a config file with `--config` and [Command line arguments](#command-line-arguments) cannot fully capture the true state of the 'launch setting'. The returned flags are the result after parsing command line argument and config file by Erigon.

The code on the side of the diagnostics system is spread across files `cmd/ui_handler.go` (invocation of `processFlags` function), `cmd/flags.go`, `assests/template/session.html` (html template the part where the button `Fetch Flags` is defined with the javascript handler), `assests/script/session.js` (function `fetchContent`), `assets/template/flags.html` (html template for the content fetched by the `fetchContent` javascript function and inserted into the HTML div element).
The code on the side of the diagnostics system is spread across files `cmd/ui_handler.go` (invocation of `processFlags` function), `cmd/flags.go`, `assets/template/session.html` (html template the part where the button `Fetch Flags` is defined with the javascript handler), `assets/script/session.js` (function `fetchContent`), `assets/template/flags.html` (html template for the content fetched by the `fetchContent` javascript function and inserted into the HTML div element).

![flags](/images/flags.png)

Expand Down Expand Up @@ -278,6 +279,18 @@ one for each reorged block found).

![scan reorgs](/images/scan_reorgs.png)

## Sync stages

This is another example of how the diagnostics system can access the Erigon node's database remotely, via `erigon support` tunnel.
This feature adds an ability to see the node's sync stage, by returning the number of synced blocks per stage.


The code on the side of the diagnostics system is spread across files `cmd/ui_handler.go` (invocation of `findSyncStages` function), `cmd/sync_stages.go`, `cmd/remote_db.org` (using the same remote database access logic as [Reorg Scanner](#reorg-scanner)), `assets/template/session.html`
(HTML template the part where the button `Fetch Sync Stages` is defined with the javascript handler), `assets/script/session.js` (function `fetchContent`), `assets/template/sync_stages.html`
(HTML template for the content fetched by the `fetchContent` javascript function and inserted into the HTML table).

![sync_stage](/images/sync_stages.png)

## Block Body Download

This is the first crude example of monitoring an algorithms involving many items (in that case block bodies) transitioning through the series of states.
Expand Down
4 changes: 4 additions & 0 deletions assets/template/session.html
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ <h2>SESSION: {{.SessionName}} PIN: {{.SessionPin}}</h2>
<button type="button" onclick="fetchContent('{{.SessionName}}', 'log list', '/ui/log_list', 'loglist')">Fetch Log List</button>
<div id="loglist"></div>
</div>
<div>
<button type="button" onclick="fetchContent('{{.SessionName}}', 'sync stages', '/ui/sync_stages', 'syncstages')">Fetch Sync Stages</button>
<div id="syncstages"></div>
</div>
<div>
<button type="button" onclick="findReorgs('{{.SessionName}}')">Find Reorgs</button>
<div id="reorgs"></div>
Expand Down
12 changes: 12 additions & 0 deletions assets/template/sync_stages.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
<table>
<tr>
<th>Stage</th>
<th>Progress (blocks)</th>
</tr>
{{range $key, $value := .}}
<tr>
<td><code>{{$key}}</code></td>
<td><code>{{$value}}</code></td>
</tr>
{{end}}
</table>
11 changes: 6 additions & 5 deletions cmd/bodies_download.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,18 @@ func (uih *UiHandler) bodiesDownload(ctx context.Context, w http.ResponseWriter,
default:
}
// First, fetch list of DB paths
success, result := uih.fetch(fmt.Sprintf("/block_body_download?sincetick=%d\n", tick), requestChannel)
success, result := uih.remoteApi.fetch(fmt.Sprintf("/block_body_download?sincetick=%d\n", tick), requestChannel)
if !success {
fmt.Fprintf(w, "Fetching list of changes: %s", result)
return
}
lines := strings.Split(result, "\n")
if len(lines) == 0 || !strings.HasPrefix(lines[0], successLine) {
fmt.Fprintf(w, "incorrect response (first line needs to be SUCCESS)\n")

lines, resultExtractErr := uih.remoteApi.getResultLines(result)
if resultExtractErr != nil {
fmt.Fprintf(w, "incorrect response: %v\n", resultExtractErr)
return
}
lines = lines[1:]

var changesMode bool
var err error
changes := map[uint64]struct{}{}
Expand Down
6 changes: 3 additions & 3 deletions cmd/bridge_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ func (bh *BridgeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
request.response = nil
request.err = fmt.Sprintf("writing metrics request: %v", err)
request.retries++
if request.retries < 16 {
if request.retries < MaxRequestRetries {
select {
case nodeSession.requestCh <- request:
default:
Expand All @@ -117,7 +117,7 @@ func (bh *BridgeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
request.response = nil
request.err = fmt.Sprintf("reading size of metrics response: %v", err)
request.retries++
if request.retries < 16 {
if request.retries < MaxRequestRetries {
select {
case nodeSession.requestCh <- request:
default:
Expand All @@ -134,7 +134,7 @@ func (bh *BridgeHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
request.response = nil
request.err = fmt.Sprintf("reading metrics response: %v", err)
request.retries++
if request.retries < 16 {
if request.retries < MaxRequestRetries {
select {
case nodeSession.requestCh <- request:
default:
Expand Down
62 changes: 62 additions & 0 deletions cmd/remote_api.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
package cmd

import (
"fmt"
"strings"
"time"
)

type RemoteApiReader interface {
fetch(url string, requestChannel chan *NodeRequest) (bool, string)
getResultLines(result string) ([]string, error)
}

type RemoteApi struct{}

func (ra *RemoteApi) fetch(url string, requestChannel chan *NodeRequest) (bool, string) {
if requestChannel == nil {
return false, "ERROR: Node is not allocated\n"
}
// Request command line arguments
nodeRequest := &NodeRequest{url: url}
requestChannel <- nodeRequest
var sb strings.Builder
var success bool
for nodeRequest != nil {
nodeRequest.lock.Lock()
clear := nodeRequest.served
if nodeRequest.served {
if nodeRequest.err == "" {
sb.Reset()
sb.Write(nodeRequest.response)
success = true
} else {
success = false
fmt.Fprintf(&sb, "ERROR: %s\n", nodeRequest.err)
if nodeRequest.retries < MaxRequestRetries {
clear = false
}
}
}
nodeRequest.lock.Unlock()
if clear {
nodeRequest = nil
} else {
time.Sleep(100 * time.Millisecond)
}
}
return success, sb.String()
}

func (ra *RemoteApi) getResultLines(result string) ([]string, error) {
lines := strings.Split(result, "\n")
if len(lines) == 0 || !strings.HasPrefix(lines[0], successLine) {
return nil, fmt.Errorf("incorrect response (first line needs to be SUCCESS): %s", result)
}

if len(lines) > 0 && len(lines[len(lines)-1]) == 0 {
lines = lines[:len(lines)-1]
}

return lines[1:], nil
}
56 changes: 56 additions & 0 deletions cmd/remote_api_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
package cmd

import (
"fmt"
"github.com/stretchr/testify/assert"
"testing"
)

func TestGetResultLines(t *testing.T) {
tt := []struct {
name string
result string
assert func(lines []string)
wantErrMsg string
}{
{
name: "should successfully get data lines from multi-line result",
result: "SUCCESS\nfirst_line",
assert: func(lines []string) {
assert.Equal(t, []string{"first_line"}, lines)
},
},
{
name: "should remove the last empty line from the result",
result: "SUCCESS\nfirst_line\n",
assert: func(lines []string) {
assert.Equal(t, []string{"first_line"}, lines)
},
},
{
name: "should return first line needs to be SUCCESS error",
result: "FAILURE\nfirst_line",
wantErrMsg: fmt.Sprintf("incorrect response (first line needs to be SUCCESS): %s", "FAILURE\nfirst_line"),
},
{
name: "should return first line needs to be SUCCESS error when no lines are returned",
result: "",
wantErrMsg: fmt.Sprintf("incorrect response (first line needs to be SUCCESS): %s", ""),
},
}

for _, tc := range tt {
t.Run(tc.name, func(t *testing.T) {
remoteApi := &RemoteApi{}

syncStageProgress, err := remoteApi.getResultLines(tc.result)

if tc.wantErrMsg != "" {
assert.EqualErrorf(t, err, tc.wantErrMsg, "expected error %q, got %s", tc.wantErrMsg, err)
return
}

tc.assert(syncStageProgress)
})
}
}
73 changes: 60 additions & 13 deletions cmd/remote_db.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,35 +6,77 @@ import (
"strings"
)

type RemoteDbReader interface {
Init(db string, table string, initialKey []byte) error
Next() ([]byte, []byte, error)
}

type RemoteCursor struct {
uih *UiHandler
remoteApi RemoteApiReader
requestChannel chan *NodeRequest
dbPath string
table string
lines []string // Parsed response
}

func NewRemoteCursor(dbPath string, table string, requestChannel chan *NodeRequest, initialKey []byte) (*RemoteCursor, error) {
rc := &RemoteCursor{dbPath: dbPath, table: table, requestChannel: requestChannel}
func NewRemoteCursor(remoteApi RemoteApiReader, requestChannel chan *NodeRequest) *RemoteCursor {
rc := &RemoteCursor{remoteApi: remoteApi, requestChannel: requestChannel}

return rc
}

func (rc *RemoteCursor) Init(db string, table string, initialKey []byte) error {
dbPath, dbPathErr := rc.findFullDbPath(db)

if dbPathErr != nil {
return dbPathErr
}

rc.dbPath = dbPath
rc.table = table

if err := rc.nextTableChunk(initialKey); err != nil {
return nil, err
return err
}

return nil
}

func (rc *RemoteCursor) findFullDbPath(db string) (string, error) {
success, dbListResponse := rc.remoteApi.fetch("/db/list\n", rc.requestChannel)
if !success {
return "", fmt.Errorf("unable to fetch database list: %s", dbListResponse)
}

lines, err := rc.remoteApi.getResultLines(dbListResponse)
if err != nil {
return "", err
}

var dbPath string
for _, line := range lines {
if strings.HasSuffix(line, fmt.Sprintf("/%s", db)) {
dbPath = line
}
}

if dbPath == "" {
return "", fmt.Errorf("database %s not found: %v", db, dbListResponse)
}
return rc, nil

return dbPath, nil
}

func (rc *RemoteCursor) nextTableChunk(startKey []byte) error {
success, result := rc.uih.fetch(fmt.Sprintf("/db/read?path=%s&table=%s&key=%x\n", rc.dbPath, rc.table, startKey), rc.requestChannel)
success, result := rc.remoteApi.fetch(fmt.Sprintf("/db/read?path=%s&table=%s&key=%x\n", rc.dbPath, rc.table, startKey), rc.requestChannel)
if !success {
return fmt.Errorf("reading %s table: %s", rc.table, result)
}
lines := strings.Split(result, "\n")
if len(lines) == 0 || !strings.HasPrefix(lines[0], successLine) {
return fmt.Errorf("incorrect response (first line needs to be SUCCESS): %v", lines)
}
lines = lines[1:]
if len(lines) > 0 && len(lines[len(lines)-1]) == 0 {
lines = lines[:len(lines)-1]
lines, err := rc.remoteApi.getResultLines(result)
if err != nil {
return err
}

rc.lines = lines
return nil
}
Expand All @@ -61,6 +103,10 @@ func advance(key []byte) []byte {
}

func (rc *RemoteCursor) Next() ([]byte, []byte, error) {
if rc.dbPath == "" || rc.table == "" {
return nil, nil, fmt.Errorf("cursor not initialized")
}

if len(rc.lines) == 0 {
return nil, nil, nil
}
Expand All @@ -78,6 +124,7 @@ func (rc *RemoteCursor) Next() ([]byte, []byte, error) {
return nil, nil, fmt.Errorf("could not parse the value [%s]: %v", line[sepIndex+3:], e)
}
rc.lines = rc.lines[1:]

if len(rc.lines) == 0 {
if e = rc.nextTableChunk(advance(k)); e != nil {
return k, v, e
Expand Down
Loading