Scraper 2.0 improvements - Part I #1481

jamescowens · 2019-06-19T20:20:47Z

This Part I of Scraper 2.0 implements a collection of improvements including

explorer mode operation
simplified explainmagnitude function
improved convergence reporting, including scraper information in the tooltip when fDebug3 is set
improved statistics and SB contract core caching based on a bClean flag in the cache global

Part II is anticipated to have

new SB format and packing
new SB contract hashing (native)
changes to accomodate new beacon approach

to support retaining of team and host files for the explorer while not including in CScraperManifests. Also maintains backward compatibility with ver 1 file manifests.

This adds support for the -explorer flag, which changes the behavior of the scraper to hold files for a longer period of time and also download team and host files. The publishing of manifests is not affected. This is the initial implementation of the explorer flag, team and host file downloading and retention.

Also do not do hash check of files excluded from publishing, since these are very large and it is very expensive and unnecessary.

Also minor other cleanup. Some structures in ConveredManifest and the cache added here may be eliminated after testing/fine-tuning.

This implements a bClean boolean in that is marked false in scraper_net when manifests are received from the network or published locally. It is marked true when a new set of statistics and SBContract core is computed. The rule is that the cached contract will be used when the cache age is younger than nScraperSleep in seconds OR the cache is clean (i.e. no new manifests have been published (if a scraper node) or received (if a normal node). This will avoid the statistics calculation pulse seen on mainnet every 300 seconds during times the scrapers are not active and publishing new manifests.

src/scraper_net.cpp

src/qt/bitcoingui.cpp

src/scraper/fwd.h

src/scraper/scraper.cpp

Also remove unnecessary bByParts flag check in GUI tooltip

src/qt/bitcoingui.cpp

src/scraper/scraper.cpp

cyrossignol

I can't comment on it inline: is this line removing the unprocessed user.gz file for a project? Do we want to check fExplorer before deleting it like with the team file?

Edit: that's line 2047 if the link doesn't work.

jamescowens · 2019-06-20T22:38:22Z

Just give me the line number. The link didn't work.

cyrossignol · 2019-06-20T22:38:57Z

Line 2047

jamescowens · 2019-06-20T22:40:20Z

I didn't change the behavior for the user files, because I didn't think startail needed to process the full user file. He and I only talked about the team and host files. If he wants the user file too, I will have to make modifications.

jamescowens · 2019-06-20T22:42:10Z

I have pinged him to clarify.

Use boost::algorithm::join to compress joining of vector elements in strings for tooltip.

jamescowens · 2019-06-20T22:55:18Z

Ok. I think we are good pending @startailcoon's clarification.

startailcoon · 2019-06-21T09:07:52Z

I didn't change the behavior for the user files, because I didn't think startail needed to process the full user file. He and I only talked about the team and host files. If he wants the user file too, I will have to make modifications.

I'm interested in processing the full user files as well, sorry if this was overlooked in our previous talks @jamescowens

src/scraper/scraper.cpp

for explorer mode. Normalized common code for aligning scraper file manifest entries into separate function AlignScraperFileManifestEntries to eliminate repeated code.

Both of those vectors must only include scrapers marked active in the appcache.

jamescowens · 2019-06-22T15:48:27Z

Ok. I think we are ready to merge this. Please take a last look.

cyrossignol

Running with -explorer: looks like it's downloading and retaining the unprocessed export files as expected and the manifest looks correct.

Gotta keep an eye on disk space in explorer mode. 8.2 GB after one day. 🙂

Perhaps a future optimization keeps only the latest unprocessed stats files. I wonder if explorers will need the same etag versions to match the converged stats. The extra space is probably minor after all.

src/scraper/scraper.cpp

jamescowens · 2019-06-22T18:17:27Z

I am not sure about what @startailcoon is going to need with these unprocessed files. He wanted a weeks worth, so I have a feeling just keeping the latest is not going to work. We may want to save just one per day, as for several of the projects they update the files multiple times per day. I think for right now, we should stick to keeping the unprocessed files for each and every etag change...

jamescowens · 2019-06-22T18:18:08Z

It eats up a lot of disk space, but I think he is prepared for that. No telling what his explorer is already using disk-space wise. I imagine quite a bit.

@a123b

Added: - Add freedesktop.org desktop file and icon set #1438 (@a123b) - Add warning in help for blockchain scan for importprivkey #1469 (@jamescowens) - Consolidateunspent rpc function #1472 (@jamescowens) - Scraper 2.0 improvements #1481, #1488, #1509, and #1514 (@jamescowens, @cyrossignol) - explorer mode operation - simplified explainmagnitude output - improved convergence reporting, including scraper information in the tooltip when fDebug3 is set - improved statistics and SB contract core caching based on a bClean flag in the cache global - new SB format and packing for bv11 - new SB contract hashing (native) for bv11 - changes to accomodate new beacon approach - Implement in memory versioning for team file ETags - Implement local dynamic team requirement removal and whitelist #1502 (@cyrossignol) Changed: - Quiet logging for getmininginfo and scraper INFO logging level #1460 (@jamescowens) - Spelling corrections #1461, #1462 (@caraka) - Update crypto module #1453 (@denravonska) - Update .travis.yml for Bionic #1475 (@jamescowens) - Create CPID classes and clean up CPID code #1477 (@cyrossignol) - Refactor researcher context and CPID harvesting #1480 (@cyrossignol) - Remove boinckey export RPC method and import handler - Notify when wallet locked in advertisebeacon RPC method #1504 (@cyrossignol) - Notify when wallet locked in beaconstatus RPC method #1506 (@cyrossignol) - Change spacer minimum height hint #1511 (@jamescowens) Removed: - Remove safe mode #1434 (@denravonska) - Remove bitcoin.moc in Makefile.qt.include #1444 (@RoboticMind) - Clean up legacy Proof-of-Work functions #1497 (@cyrossignol) Fixed: - Constrain walletpassphrase to 10000000 seconds #1459 (@jamescowens) - Straighten out localization in the scraper. #1471 (@jamescowens) - Quick fix for rainbymagnitude #1473 (@jamescowens) - Correct negation error in scraper tooltip for vScrapersNotPublishing #1484 (@jamescowens) - Fix staked block rejection when active researcher #1485 (@cyrossignol) - Add back informational magnitude to generated blocks #1489 (@cyrossignol) - Add back in the in sync check in ScraperGetNeuralContract #1492 (@jamescowens) - Scraper correct team file processing. #1501 (@jamescowens) - Have importwallet file path default to datadir #1508 (@jamescowens) - Scraper add Beacon Map size check to ensure convergence #1515 (@jamescowens)

jamescowens added 9 commits June 4, 2019 20:23

Add field excludefromcsmanifest to ScraperFileManifestEntry

39f901c

to support retaining of team and host files for the explorer while not including in CScraperManifests. Also maintains backward compatibility with ver 1 file manifests.

Update .travis.yml to support Bionic

532659d

Merge branch 'development' into integrated_scraper_2

8343e98

Add more detailed lock descriptions for troubleshooting

2dc0721

Also do not do hash check of files excluded from publishing, since these are very large and it is very expensive and unnecessary.

Simplify explainmagnitude function

324b955

Merge branch 'development' into integrated_scraper_2

90535df

Improve convergence reporting

3ed2059

Also minor other cleanup. Some structures in ConveredManifest and the cache added here may be eliminated after testing/fine-tuning.

jamescowens requested review from denravonska and cyrossignol June 19, 2019 20:21

jamescowens added this to the Elizabeth milestone Jun 19, 2019

jamescowens self-assigned this Jun 19, 2019

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/scraper_net.cpp Show resolved Hide resolved

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/qt/bitcoingui.cpp Outdated Show resolved Hide resolved

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/scraper/fwd.h Show resolved Hide resolved

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/scraper/scraper.cpp Show resolved Hide resolved

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/scraper/scraper.cpp Show resolved Hide resolved

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/scraper/scraper.cpp Show resolved Hide resolved

Add cache invalidate to CScraperManifest::DeleteManifest()

1ce7762

Also remove unnecessary bByParts flag check in GUI tooltip

denravonska reviewed Jun 20, 2019

View reviewed changes

src/qt/bitcoingui.cpp Outdated Show resolved Hide resolved

denravonska reviewed Jun 20, 2019

View reviewed changes

src/qt/bitcoingui.cpp Outdated Show resolved Hide resolved

jamescowens force-pushed the integrated_scraper_2 branch 2 times, most recently from 17bce0f to ca85096 Compare June 20, 2019 20:23

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/scraper/scraper.cpp Show resolved Hide resolved

cyrossignol reviewed Jun 20, 2019

View reviewed changes

src/scraper/scraper.cpp Outdated Show resolved Hide resolved

cyrossignol reviewed Jun 20, 2019

View reviewed changes

jamescowens force-pushed the integrated_scraper_2 branch from ca85096 to 5ac717c Compare June 20, 2019 22:48

Simplify tooltip code

7e4ce4d

Use boost::algorithm::join to compress joining of vector elements in strings for tooltip.

jamescowens force-pushed the integrated_scraper_2 branch from 5ac717c to 7e4ce4d Compare June 20, 2019 22:49

Correct _log typos in host and team file download functions

a0cd6c7

cyrossignol reviewed Jun 22, 2019

View reviewed changes

src/scraper/scraper.cpp Outdated Show resolved Hide resolved

jamescowens added 2 commits June 22, 2019 10:04

Add functionality to support retention of unprocessed user files

4eb3e70

for explorer mode. Normalized common code for aligning scraper file manifest entries into separate function AlignScraperFileManifestEntries to eliminate repeated code.

Ensure scraper excluded and included list is properly scoped

9425600

Both of those vectors must only include scrapers marked active in the appcache.

jamescowens requested review from denravonska and cyrossignol June 22, 2019 15:48

cyrossignol approved these changes Jun 22, 2019

View reviewed changes

cyrossignol reviewed Jun 22, 2019

View reviewed changes

src/scraper/scraper.cpp Outdated Show resolved Hide resolved

Correct missing hyphen on filenames for unprocessed user files.

b4dfde6

jamescowens force-pushed the integrated_scraper_2 branch from 6e86f12 to b4dfde6 Compare June 22, 2019 18:28

jamescowens merged commit d267bf1 into gridcoin-community:development Jun 23, 2019

jamescowens mentioned this pull request Jun 23, 2019

Correct negation error in scraper tooltip for vScrapersNotPublishing #1484

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper 2.0 improvements - Part I #1481

Scraper 2.0 improvements - Part I #1481

jamescowens commented Jun 19, 2019

cyrossignol left a comment •

edited

Loading

jamescowens commented Jun 20, 2019

cyrossignol commented Jun 20, 2019

jamescowens commented Jun 20, 2019 •

edited

Loading

jamescowens commented Jun 20, 2019

jamescowens commented Jun 20, 2019

startailcoon commented Jun 21, 2019

jamescowens commented Jun 22, 2019

cyrossignol left a comment •

edited

Loading

jamescowens commented Jun 22, 2019

jamescowens commented Jun 22, 2019

Scraper 2.0 improvements - Part I #1481

Scraper 2.0 improvements - Part I #1481

Conversation

jamescowens commented Jun 19, 2019

cyrossignol left a comment • edited Loading

Choose a reason for hiding this comment

jamescowens commented Jun 20, 2019

cyrossignol commented Jun 20, 2019

jamescowens commented Jun 20, 2019 • edited Loading

jamescowens commented Jun 20, 2019

jamescowens commented Jun 20, 2019

startailcoon commented Jun 21, 2019

jamescowens commented Jun 22, 2019

cyrossignol left a comment • edited Loading

Choose a reason for hiding this comment

jamescowens commented Jun 22, 2019

jamescowens commented Jun 22, 2019

cyrossignol left a comment •

edited

Loading

jamescowens commented Jun 20, 2019 •

edited

Loading

cyrossignol left a comment •

edited

Loading