Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor the indexer #726

Merged
merged 24 commits into from
Mar 20, 2024
Merged

refactor the indexer #726

merged 24 commits into from
Mar 20, 2024

Conversation

Omarabdul3ziz
Copy link
Contributor

@Omarabdul3ziz Omarabdul3ziz commented Jan 31, 2024

Description & Changes

  • combine indexers to be a single indexer with different watchers
  • use a single RpcRmbClient across the proxy
  • add more flags -no-indexer and no-modify to the commands & update the workflow to stop the indexer while integration tests
  • add watchers for dmi/speed
  • implement Scanner/Valuer for model structs
  • add triggers on the new table to update the resources cache
  • use gorm on the crafter for easier mock data generations for the custom model types
  • add loader/tests for the new fields

Related issues

Checklist

  • Tests included
  • Build pass
  • Documentation
  • Code format and docstring

healthIndexerInterval uint
healthIndexerWorkers uint
healthIndexerInterval uint
dmiWatcherWorkers uint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we move from the word "indexer" to watcher?

@@ -53,6 +55,11 @@ type CapacityResult struct {
Used Capacity `json:"used_resources"`
}

type Speed struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed there're multiple speeds for upload / download there's ipv4, ipv6 versions? which is it? and are

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes we have ipv4 and ipv6 for both tcp and udp these are four per an iperf server,
for example devnet has two iperf servers on the two public nodes so we have total of 8 tests

a discussion was raised on the chat group and i remember the decision was to show the only non-zero value. should we do something different? or view them all?

@@ -0,0 +1,34 @@
package indexer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the file to filter_up_nodes.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was planned to have more utils method for filtering up nodes/healthy nodes or call rmb

// item2
// ...
// ]
type DMI struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why didn't you bring the dmidecoder from zos ?

)

const (
indexerCallTimeout = 10 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump that to 30


func NewIndexer(
ctx context.Context,
paused bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be here, if for internal state, it should be defaulted to the zero value, false

"github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/internal/explorer/db"
"github.com/threefoldtech/tfgrid-sdk-go/grid-proxy/pkg/types"
"github.com/threefoldtech/tfgrid-sdk-go/rmb-sdk-go/peer"
rmbTypes "github.com/threefoldtech/tfgrid-sdk-go/rmb-sdk-go/peer/types"
)

const (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefix all of the consts with gpu (given there're multiple indexers in the same package)

db: db,
rpcClient: rpcClient,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that client a reconnecting one, in long running processes you should be careful not to lose that single connection you have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik yes, the rpcClient taking a substrate manager and handle the open/close connections when needed

@@ -130,11 +108,25 @@ func (n *NodeGPUIndexer) runQueryGridNodes(ctx context.Context) {
}

func (n *NodeGPUIndexer) getNodeGPUInfo(ctx context.Context, nodeTwinID uint32) error {
ctx, cancel := context.WithTimeout(ctx, 10*time.Second)
subCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bump to 30

}
}

func parseDmiResponse(dmiResponse DMI) types.DmiInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace everything related to parsing by zos dmidecode package, don't do it yourself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked the dmi pkg in zos and the DMI response is already parsed from the pkg in zos.
what i am doing here is just selecting what we want to store instead of storing the whole dmi response

Copy link
Contributor

@xmonader xmonader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to stop the review.

1- Please make sure all of the indexers are having the similar interface
2- Remove everything related to any manual parsing of the dmidecode
3- Prefix all of the constants related to each indexer with the indexer name e.g healthIndexerNumWorkers
4- for every indexer there should be

  • a Start method, can have its own interface Starter
  • a Stop method, can have its own interface Stopper
  • a FindHealthyNodes method
  • a ResultsBatcher method
  • a DBBatchesUpserter method

All of them can be converted to functions whenever needed.

Each indexer file should have the logic top to bottom

e.g

  • consts
  • vars
  • types
  • factories
  • entry point Start function
  • then each of the methods asked for

Note: consts and vars should be prefixed with the indexer name

Use the same terms everywhere: e.g status, state, numWorkers, workersCount, .. etc All of these need to be unified

in an upcoming improvement we can use singleflight to avoid expensive DB/network calls e.g all are asking for the healthy nodes within the same timeframe

Aside from that, the PR already has too many concerns, supporting the dmi, refactoring in the indexers, introducing more dependencies. Please in the future make sure it's scoped.

…exers:

- configure a flag to stop or start the indexer manager on the startup. it is useful to use `-no-indexer` flag while running integration tests, to keep the loaded data in the mock client consistent with the actual data in DB.
- organize the database calls in `postgres.go` file by separating the indexer calls into a new file. also, save consistency for numeric `NodeTwinId` filed to be `uint32` in all methods
- remove the duplicated types across different pkgs and use only the ones in `pkg/types` pkgs db/tests/generator
- add new trigger on the `node_gpu` table, this trigger should be fired on insert/delete to update the `num_gpu` filed on the `resources_cache` table
- remove all implementation for `Scanner/Valuer` interfaces for jsonb fields in gorm model. instead easier and cleaner, use the registered json serializer. apply this for dmi fields/node power object
- modify the generation of `healthy_reports` mock data to use `gorm` functionality instead of creating a plain SQL string
- update and organize the schema file used on the db mock generation to be aligned with the new defined struct models.

Indexer pkg changes:

- import types directly from zos once needed instead of re-implement it, applied this for IperfTestResult and DMI
- create utils method, generic methods that can be used in all indexers to avoid repeating the code
- create a manager that register/control all the indexers
- define an `Indexer` interface that defines the needed methods for each indexer
- document the indexer package along with the registered indexers and their metadata
- implement `Indexer` interface for each registered interface, this requires some refactoring for the indexer especially the GPU indexer
- a better decision on old GPU node deletion by adding a timestamp expiration and check it is expired or not with each upsert, the allowed interval is the same as the check interval in the indexer an hour for now. this is better than checking the lastAdded twin id cause we have multiple caller/batcher that may conflict the results.
- add new trigger for the dmi indexer, to start querying the newly added nodes and don’t wait for the new rotation
- correctly handle the indexer manager start is !noIndexer after debugging
- change the updated_at field on gpu table in schema to be a bigint to align with the struct model
- use gorm instead of plain sql for scanning in loaders in both load nodes/dmi to benefit from the registered json serializer instead of implementing the scanning logic for the jsonb fields
@Omarabdul3ziz
Copy link
Contributor Author

Update changes:

  • configure a flag to stop or start the indexer manager on the startup. it is useful to use -no-indexer flag while running integration tests, to keep the loaded data in the mock client consistent with the actual data in DB.
  • organize the database calls in postgres.go file by separating the indexer calls into a new file. also, save consistency for numeric NodeTwinId filed to be uint32 in all methods
  • remove the duplicated types across different pkgs and use only the ones in pkg/types pkgs db/tests/generator
  • add new trigger on the node_gpu table, this trigger should be fired on insert/delete to update the num_gpu filed on the resources_cache table
  • remove all implementation for Scanner/Valuer interfaces for jsonb fields in gorm model. instead easier and cleaner, use the registered json serializer. apply this for dmi fields/node power object
  • modify the generation of healthy_reports mock data to use gorm functionality instead of creating a plain SQL string
  • update and organize the schema file used on the db mock generation to be aligned with the new defined struct models.

Indexer pkg changes:

  • import types directly from zos once needed instead of re-implement it, applied this for IperfTestResult and DMI
  • create utils method, generic methods that can be used in all indexers to avoid repeating the code
  • create a manager that register/control all the indexers
  • define an Indexer interface that defines the needed methods for each indexer
  • document the indexer package along with the registered indexers and their metadata
  • implement Indexer interface for each registered interface, this requires some refactoring for the indexer especially the GPU indexer
  • a better decision on old GPU node deletion by adding a timestamp expiration and check it is expired or not with each upsert, the allowed interval is the same as the check interval in the indexer an hour for now. this is better than checking the lastAdded twin id cause we have multiple caller/batcher that may conflict the results.
  • add new trigger for the dmi indexer, to start querying the newly added nodes and don’t wait for the new rotation

flag.UintVar(&f.dmiIndexerIntervalMins, "dmi-indexer-interval", 60*24, "node dmi check interval in min")
flag.UintVar(&f.dmiIndexerNumWorkers, "dmi-indexer-workers", 1, "number of workers checking on node dmi")
flag.UintVar(&f.speedIndexerIntervalMins, "speed-indexer-interval", 5, "node speed check interval in min")
flag.UintVar(&f.speedIndexerNumWorkers, "speed-indexer-workers", 100, "number of workers checking on node speed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering why do we need multiple indexer workers to query things from nodes, can't we have a single set of workers that they ask the node for the same thing for everything?

I mean i see here, dmi-indexer-workers, gpu-indexer-workers, speed-indexer-workers (whatever that means), etc...

This is just a question, not request for change

Comment on lines 48 to 49
go n.StartNodeFinder(ctx)
go n.startNodeTableWatcher(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these methods need to be public? I believe they are intended for private use by the indexer object. I suggest they are renamed to be private instaed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to all other indexers here

batchSize uint
}

func NewNodeHealthIndexer(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that the an indexer mainly does the same operation, I was wondering that an indexer logic can be abstracted more.
From what i see an indexer has a set of options (say workers), he run indexing every certain amount of time. Then does some sort of batching to update the db.
All this logic then can be unified. Then each indexer need to define 2 things.

  • The function that does the "fetch" of the data.
  • The logic of "batch" db update.

Those 2 functions then can be part of an IndexerWork interface. And can has multiple implementation (say gpu and dmi) and then an Indexer instance can be created for each something like NewIndexer(GPUIndexerWork()) for example. Then call indexer.Start()

Description:

- redefining the indexer structs to be a single generic indexer without manager, each indexer should do the generic functions of find/get/batch/upsert results. with each indexer have different job based on the defined type will creating the indexer.
- each indexer job should only handle its specific job of:
    - just defining its finders or triggers
    - preparing the rmb call to the node, parse the response and return a ready db model template that can stored
    - upserting its restults by updating or descarding or inserting the passed arguments
- the indexer code is the one responsible for doing the logic, by
    - executing the finders methods for each indexer as it is defined in its job based on the interval
    - starting the getter worker that invoke the get method on the work
    - starting a batcher for each indexer, that have two channels as its ends
    - starting the upserter worker that invoke the upsert method on the work

Changes:

- remove the indexer manager for easily starting a separate generic indexer based on the resource type.
- remove the redundant `NewIndexer` `Start` `Find` `Batch` methods for each indexer for a generic method on the `Indexer[T any]` struct
- update the return of the get method on work to be `[]T` instead `T` to cover all the types indexer responses like gpu call which return a slice of gpus
- creating the finders map for easly reusablity of the finders method on new/up/health nodes
- check the uniqueness of the nodes on the batch before inserting to prevent having duplicated data comming from two different triggers like healthy/up or up/new
- modify the default sql-log-level in the makefile
- add more debug logging statement
- update docs
@Omarabdul3ziz Omarabdul3ziz force-pushed the development_proxy_indexer branch from b7916c1 to 2db2a21 Compare February 25, 2024 12:22
@Omarabdul3ziz
Copy link
Contributor Author

Converting to draft to not merge till 3.13 reaches mainnet.

@Omarabdul3ziz Omarabdul3ziz marked this pull request as draft February 27, 2024 10:06
@Omarabdul3ziz Omarabdul3ziz force-pushed the development_proxy_indexer branch from c8f7886 to 119a62a Compare March 12, 2024 08:26
@xmonader
Copy link
Contributor

yalla beena

@Omarabdul3ziz Omarabdul3ziz merged commit 3f788cb into development Mar 20, 2024
37 checks passed
@Omarabdul3ziz Omarabdul3ziz deleted the development_proxy_indexer branch March 20, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grid-proxy belongs to grid proxy
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants