Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scraper 2.0 improvements - Part I #1481

Merged

Conversation

jamescowens
Copy link
Member

This Part I of Scraper 2.0 implements a collection of improvements including

  • explorer mode operation
  • simplified explainmagnitude function
  • improved convergence reporting, including scraper information in the tooltip when fDebug3 is set
  • improved statistics and SB contract core caching based on a bClean flag in the cache global

Part II is anticipated to have

  • new SB format and packing
  • new SB contract hashing (native)
  • changes to accomodate new beacon approach

to support retaining of team and host files for the explorer
while not including in CScraperManifests. Also maintains backward
compatibility with ver 1 file manifests.
This adds support for the -explorer flag, which changes the
behavior of the scraper to hold files for a longer period of time
and also download team and host files. The publishing of manifests
is not affected.

This is the initial implementation of the explorer flag, team and host
file downloading and retention.
Also do not do hash check of files excluded from publishing, since
these are very large and it is very expensive and unnecessary.
Also minor other cleanup.

Some structures in ConveredManifest and the cache added here may be eliminated
after testing/fine-tuning.
This implements a bClean boolean in that is marked false
in scraper_net when manifests are received from the network
or published locally. It is marked true when a new set of
statistics and SBContract core is computed.

The rule is that the cached contract will be used when the cache age
is younger than nScraperSleep in seconds OR the cache is clean (i.e.
no new manifests have been published (if a scraper node) or received
(if a normal node). This will avoid the statistics
calculation pulse seen on mainnet every 300 seconds during times
the scrapers are not active and publishing new manifests.
@jamescowens jamescowens added this to the Elizabeth milestone Jun 19, 2019
@jamescowens jamescowens self-assigned this Jun 19, 2019
Also remove unnecessary bByParts flag check in GUI tooltip
@jamescowens jamescowens force-pushed the integrated_scraper_2 branch 2 times, most recently from 17bce0f to ca85096 Compare June 20, 2019 20:23
Copy link
Member

@cyrossignol cyrossignol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on it inline: is this line removing the unprocessed user.gz file for a project? Do we want to check fExplorer before deleting it like with the team file?

Edit: that's line 2047 if the link doesn't work.

@jamescowens
Copy link
Member Author

Just give me the line number. The link didn't work.

@cyrossignol
Copy link
Member

Line 2047

@jamescowens
Copy link
Member Author

jamescowens commented Jun 20, 2019

I didn't change the behavior for the user files, because I didn't think startail needed to process the full user file. He and I only talked about the team and host files. If he wants the user file too, I will have to make modifications.

@jamescowens
Copy link
Member Author

I have pinged him to clarify.

@jamescowens jamescowens force-pushed the integrated_scraper_2 branch from ca85096 to 5ac717c Compare June 20, 2019 22:48
Use boost::algorithm::join to compress joining of vector elements
in strings for tooltip.
@jamescowens jamescowens force-pushed the integrated_scraper_2 branch from 5ac717c to 7e4ce4d Compare June 20, 2019 22:49
@jamescowens
Copy link
Member Author

Ok. I think we are good pending @startailcoon's clarification.

@startailcoon
Copy link

I didn't change the behavior for the user files, because I didn't think startail needed to process the full user file. He and I only talked about the team and host files. If he wants the user file too, I will have to make modifications.

I'm interested in processing the full user files as well, sorry if this was overlooked in our previous talks @jamescowens

for explorer mode.

Normalized common code for aligning scraper file manifest entries
into separate function AlignScraperFileManifestEntries to eliminate
repeated code.
Both of those vectors must only include scrapers marked active
in the appcache.
@jamescowens
Copy link
Member Author

Ok. I think we are ready to merge this. Please take a last look.

Copy link
Member

@cyrossignol cyrossignol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running with -explorer: looks like it's downloading and retaining the unprocessed export files as expected and the manifest looks correct.

Gotta keep an eye on disk space in explorer mode. 8.2 GB after one day. 🙂

Perhaps a future optimization keeps only the latest unprocessed stats files. I wonder if explorers will need the same etag versions to match the converged stats. The extra space is probably minor after all.

@jamescowens
Copy link
Member Author

I am not sure about what @startailcoon is going to need with these unprocessed files. He wanted a weeks worth, so I have a feeling just keeping the latest is not going to work. We may want to save just one per day, as for several of the projects they update the files multiple times per day. I think for right now, we should stick to keeping the unprocessed files for each and every etag change...

@jamescowens
Copy link
Member Author

It eats up a lot of disk space, but I think he is prepared for that. No telling what his explorer is already using disk-space wise. I imagine quite a bit.

@jamescowens jamescowens force-pushed the integrated_scraper_2 branch from 6e86f12 to b4dfde6 Compare June 22, 2019 18:28
@jamescowens jamescowens merged commit d267bf1 into gridcoin-community:development Jun 23, 2019
jamescowens added a commit that referenced this pull request Aug 20, 2019
Added:
 - Add freedesktop.org desktop file and icon set #1438 (@a123b)
 - Add warning in help for blockchain scan for importprivkey #1469 (@jamescowens)
 - Consolidateunspent rpc function #1472 (@jamescowens)
 - Scraper 2.0 improvements #1481, #1488, #1509, and #1514 (@jamescowens, @cyrossignol)
   - explorer mode operation
   - simplified explainmagnitude output
   - improved convergence reporting, including scraper information in the tooltip when fDebug3 is set
   - improved statistics and SB contract core caching based on a bClean flag in the cache global
   - new SB format and packing for bv11
   - new SB contract hashing (native) for bv11
   - changes to accomodate new beacon approach
   - Implement in memory versioning for team file ETags
 - Implement local dynamic team requirement removal and whitelist #1502 (@cyrossignol)

Changed:
 - Quiet logging for getmininginfo and scraper INFO logging level #1460 (@jamescowens)
 - Spelling corrections #1461, #1462 (@caraka)
 - Update crypto module #1453 (@denravonska)
 - Update .travis.yml for Bionic #1475 (@jamescowens)
 - Create CPID classes and clean up CPID code #1477 (@cyrossignol)
 - Refactor researcher context and CPID harvesting #1480 (@cyrossignol)
   - Remove boinckey export RPC method and import handler
 - Notify when wallet locked in advertisebeacon RPC method #1504 (@cyrossignol)
 - Notify when wallet locked in beaconstatus RPC method #1506 (@cyrossignol)
 - Change spacer minimum height hint #1511 (@jamescowens)

Removed:
 - Remove safe mode #1434 (@denravonska)
 - Remove bitcoin.moc in Makefile.qt.include #1444 (@RoboticMind)
 - Clean up legacy Proof-of-Work functions #1497 (@cyrossignol)

Fixed:
 - Constrain walletpassphrase to 10000000 seconds #1459 (@jamescowens)
 - Straighten out localization in the scraper. #1471 (@jamescowens)
 - Quick fix for rainbymagnitude #1473 (@jamescowens)
 - Correct negation error in scraper tooltip for vScrapersNotPublishing #1484 (@jamescowens)
 - Fix staked block rejection when active researcher #1485 (@cyrossignol)
 - Add back informational magnitude to generated blocks #1489 (@cyrossignol)
 - Add back in the in sync check in ScraperGetNeuralContract #1492 (@jamescowens)
 - Scraper correct team file processing. #1501 (@jamescowens)
 - Have importwallet file path default to datadir #1508 (@jamescowens)
 - Scraper add Beacon Map size check to ensure convergence #1515 (@jamescowens)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants