-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add include minority results in evaluate script output. Closes #458 #479
base: main
Are you sure you want to change the base?
Conversation
@@ -42,7 +42,7 @@ const EVALUATION_NDJSON_FILE = `${basename(measurementsPath, '.ndjson')}.evaluat | |||
const evaluationTxtWriter = fs.createWriteStream(EVALUATION_TXT_FILE) | |||
const evaluationNdjsonWriter = fs.createWriteStream(EVALUATION_NDJSON_FILE) | |||
|
|||
evaluationTxtWriter.write(formatHeader({ includeEvaluation: keepRejected }) + '\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"includeEvaluation" didn't make sense since we are dealing with tasking and consensus result individually. Is it ok though to call a measurement that passed tasking but not consensus "accepted"/"not rejected"? Or are accepted measurements only the ones that pass tasking and consensus, and we need a new name here: includeTasking
Thinking about this more, I might have misunderstood the goal. We were including non-majority results before (see PR description), and those are important, for the non-consensus based scores. What I'm not understanding then is what should actually be changed? |
It's great to see a fresh view on this 👏🏻
When I created #458, I understood that the script prints only measurements that passed the tasking evaluation and were in the majority - based on the following condition used to filter measurements to print: if (m.taskingEvaluation !== 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT') continue Which made the script useless for troubleshooting why an SP had a RSR smaller than 100%. However, now that I am reading that line again, I think there is a bug. This is what I probably wanted to write in #442: if (!(m.taskingEvaluation === 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT')) continue
When I wrote that issue, I was expecting to change the condition shown above to the following plus update the output to include information about the committee consensus: if (m.taskingEvaluation !== 'OK') continue See #396 for the very first iteration on that. Having written the above, I like your ideas on how to revamp the output of this script. Maybe we can implement them on top of the fix too? |
// See https://github.com/filecoin-station/spark-evaluate/pull/396 | ||
fields.push((m.taskingEvaluation === 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT' ? '🫡 ' : '🙅 ')) | ||
if (keepRejected) { | ||
fields.push((m.taskingEvaluation === 'OK' ? '🫡' : '🙅').padEnd(7)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we include a --help command or similar which explains what emoji means what?
@bajtos got it! I took me a bit to realize that this is never what we want: if (m.taskingEvaluation !== 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT') continue So, you're proposing to change the line to: if (m.taskingEvaluation !== 'OK') continue
What do you mean by this? Would this be the final loop: for (const m of round.measurements) {
if (m.taskingEvaluation !== 'OK') continue
resultCounts.total++
const status = m.consensusEvaluation !== 'MAJORITY_RESULT'
? m.consensusEvaluation
: m.retrievalResult
resultCounts[status] = (resultCounts[status] ?? 0) + 1
} |
Closes #458
Before
☝️ Notice how minority results are included in the output, although they aren't actionable for the SP.
☝️ Here there's no notion of whether something is a minority result or not.
☝️ Same problem as above
☝️ It's not clear what 🕵️ means.
After
☝️ See how the output has improved, and unactionable minority results are grouped in one row. Measurements that don't pass the tasking algorithm aren't included. Based on the above, the SP should have 100% RSR for the measurements tested.
@bajtos should we exclude
MINORITY_RESULT
here and just say OK 100%?Notice new "Consensus" field. Measurements that don't pass the tasking algorithm aren't included.
☝️ 👇 These now includes codes for everything: retrieval result, tasking failure and consensus failure.