Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add include minority results in evaluate script output. Closes #458 #479

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

juliangruber
Copy link
Member

Closes #458

Before

$ node bin/evaluate-measurements.js measurements-f010479.ndjson
[...]
Found 484 accepted measurements.
  OK                                       480        (99.17%)
  TIMEOUT                                  1          (0.2%)
  HOSTNAME_DNS_ERROR                       2          (0.41%)
  IPNI_ERROR_FETCH                         1          (0.2%)
[...]

☝️ Notice how minority results are included in the output, although they aren't actionable for the SP.

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       OK

☝️ Here there's no notion of whether something is a minority result or not.

$ KEEP_REJECTED=1 node bin/evaluate-measurements.js measurements-f010479.ndjson
[...]
Found 484 accepted measurements.
  OK                                       480        (99.17%)
  TIMEOUT                                  1          (0.2%)
  HOSTNAME_DNS_ERROR                       2          (0.41%)
  IPNI_ERROR_FETCH                         1          (0.2%)
[...]

☝️ Same problem as above

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   🕵️   RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       🫡   OK

☝️ It's not clear what 🕵️ means.

After

$ node bin/evaluate-measurements.js measurements-f010479.ndjson
Found 433 accepted measurements.
  OK                                       429        (99.07%)
  MINORITY_RESULT                          4          (0.92%)

☝️ See how the output has improved, and unactionable minority results are grouped in one row. Measurements that don't pass the tasking algorithm aren't included. Based on the above, the SP should have 100% RSR for the measurements tested.

@bajtos should we exclude MINORITY_RESULT here and just say OK 100%?

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   Consensus RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       🫡        OK

Notice new "Consensus" field. Measurements that don't pass the tasking algorithm aren't included.

$ KEEP_REJECTED=1 node bin/evaluate-measurements.js measurements-f010479.ndjson
[...]
Found 484 accepted measurements.
  OK                                       429        (88.63%)
  TASK_WRONG_NODE                          23         (4.75%)
  MINORITY_RESULT                          4          (0.82%)
  DUP_INET_GROUP                           27         (5.57%)
  TASK_NOT_IN_ROUND                        1          (0.2%)
[...]

☝️ 👇 These now includes codes for everything: retrieval result, tasking failure and consensus failure.

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   Tasking Consensus RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       🫡      🫡        OK

@@ -42,7 +42,7 @@ const EVALUATION_NDJSON_FILE = `${basename(measurementsPath, '.ndjson')}.evaluat
const evaluationTxtWriter = fs.createWriteStream(EVALUATION_TXT_FILE)
const evaluationNdjsonWriter = fs.createWriteStream(EVALUATION_NDJSON_FILE)

evaluationTxtWriter.write(formatHeader({ includeEvaluation: keepRejected }) + '\n')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"includeEvaluation" didn't make sense since we are dealing with tasking and consensus result individually. Is it ok though to call a measurement that passed tasking but not consensus "accepted"/"not rejected"? Or are accepted measurements only the ones that pass tasking and consensus, and we need a new name here: includeTasking

@juliangruber
Copy link
Member Author

Right now, the script evaluate-measurements.js outputs only the results in the majority. That makes it impossible to understand why the RSR score is not 100%.

Thinking about this more, I might have misunderstood the goal. We were including non-majority results before (see PR description), and those are important, for the non-consensus based scores. What I'm not understanding then is what should actually be changed?

@bajtos
Copy link
Member

bajtos commented Feb 17, 2025

It's great to see a fresh view on this 👏🏻

Notice how minority results are included in the output, although they aren't actionable for the SP.

When I created #458, I understood that the script prints only measurements that passed the tasking evaluation and were in the majority - based on the following condition used to filter measurements to print:

if (m.taskingEvaluation !== 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT') continue

Which made the script useless for troubleshooting why an SP had a RSR smaller than 100%.

However, now that I am reading that line again, I think there is a bug. This is what I probably wanted to write in #442:

if (!(m.taskingEvaluation === 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT')) continue

We were including non-majority results before (see PR description), and those are important, for the non-consensus based scores. What I'm not understanding then is what should actually be changed?

When I wrote that issue, I was expecting to change the condition shown above to the following plus update the output to include information about the committee consensus:

if (m.taskingEvaluation !== 'OK') continue

See #396 for the very first iteration on that.

Having written the above, I like your ideas on how to revamp the output of this script. Maybe we can implement them on top of the fix too?

// See https://github.com/filecoin-station/spark-evaluate/pull/396
fields.push((m.taskingEvaluation === 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT' ? '🫡 ' : '🙅 '))
if (keepRejected) {
fields.push((m.taskingEvaluation === 'OK' ? '🫡' : '🙅').padEnd(7))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we include a --help command or similar which explains what emoji means what?

@juliangruber
Copy link
Member Author

juliangruber commented Feb 18, 2025

@bajtos got it! I took me a bit to realize that this is never what we want:

if (m.taskingEvaluation !== 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT') continue

So, you're proposing to change the line to:

if (m.taskingEvaluation !== 'OK') continue

plus update the output to include information about the committee consensus

What do you mean by this? Would this be the final loop:

for (const m of round.measurements) {
    if (m.taskingEvaluation !== 'OK') continue
    resultCounts.total++
    const status = m.consensusEvaluation !== 'MAJORITY_RESULT'
        ? m.consensusEvaluation
        : m.retrievalResult
    resultCounts[status] = (resultCounts[status] ?? 0) + 1
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Include minority retrieval results in the output of evaluate-measurements.js script
3 participants