Add include minority results in evaluate script output. Closes #458 #479

juliangruber · 2025-02-17T11:04:41Z

Closes #458

Before

$ node bin/evaluate-measurements.js measurements-f010479.ndjson
[...]
Found 484 accepted measurements.
  OK                                       480        (99.17%)
  TIMEOUT                                  1          (0.2%)
  HOSTNAME_DNS_ERROR                       2          (0.41%)
  IPNI_ERROR_FETCH                         1          (0.2%)
[...]

☝️ Notice how minority results are included in the output, although they aren't actionable for the SP.

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       OK

☝️ Here there's no notion of whether something is a minority result or not.

$ KEEP_REJECTED=1 node bin/evaluate-measurements.js measurements-f010479.ndjson
[...]
Found 484 accepted measurements.
  OK                                       480        (99.17%)
  TIMEOUT                                  1          (0.2%)
  HOSTNAME_DNS_ERROR                       2          (0.41%)
  IPNI_ERROR_FETCH                         1          (0.2%)
[...]

☝️ Same problem as above

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   🕵️   RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       🫡   OK

☝️ It's not clear what 🕵️ means.

After

$ node bin/evaluate-measurements.js measurements-f010479.ndjson
Found 433 accepted measurements.
  OK                                       429        (99.07%)
  MINORITY_RESULT                          4          (0.92%)

☝️ See how the output has improved, and unactionable minority results are grouped in one row. Measurements that don't pass the tasking algorithm aren't included. Based on the above, the SP should have 100% RSR for the measurements tested.

@bajtos should we exclude MINORITY_RESULT here and just say OK 100%?

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   Consensus RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       🫡        OK

Notice new "Consensus" field. Measurements that don't pass the tasking algorithm aren't included.

$ KEEP_REJECTED=1 node bin/evaluate-measurements.js measurements-f010479.ndjson
[...]
Found 484 accepted measurements.
  OK                                       429        (88.63%)
  TASK_WRONG_NODE                          23         (4.75%)
  MINORITY_RESULT                          4          (0.82%)
  DUP_INET_GROUP                           27         (5.57%)
  TASK_NOT_IN_ROUND                        1          (0.2%)
[...]

☝️ 👇 These now includes codes for everything: retrieval result, tasking failure and consensus failure.

$ head -n2 measurements-f010479.evaluation.txt
Timestamp                CID                                                                    Protocol   Tasking Consensus RetrievalResult
2025-02-16T20:22:15.307Z bafybeibhg66222vgfas5yc72hpnnydglezznfwjjwcgro6jgnj2st5ydi4            http       🫡      🫡        OK

juliangruber · 2025-02-17T11:05:54Z

bin/evaluate-measurements.js

@@ -42,7 +42,7 @@ const EVALUATION_NDJSON_FILE = `${basename(measurementsPath, '.ndjson')}.evaluat
 const evaluationTxtWriter = fs.createWriteStream(EVALUATION_TXT_FILE)
 const evaluationNdjsonWriter = fs.createWriteStream(EVALUATION_NDJSON_FILE)

-evaluationTxtWriter.write(formatHeader({ includeEvaluation: keepRejected }) + '\n')


"includeEvaluation" didn't make sense since we are dealing with tasking and consensus result individually. Is it ok though to call a measurement that passed tasking but not consensus "accepted"/"not rejected"? Or are accepted measurements only the ones that pass tasking and consensus, and we need a new name here: includeTasking

juliangruber · 2025-02-17T11:50:12Z

Right now, the script evaluate-measurements.js outputs only the results in the majority. That makes it impossible to understand why the RSR score is not 100%.

Thinking about this more, I might have misunderstood the goal. We were including non-majority results before (see PR description), and those are important, for the non-consensus based scores. What I'm not understanding then is what should actually be changed?

bajtos · 2025-02-17T16:59:42Z

It's great to see a fresh view on this 👏🏻

Notice how minority results are included in the output, although they aren't actionable for the SP.

When I created #458, I understood that the script prints only measurements that passed the tasking evaluation and were in the majority - based on the following condition used to filter measurements to print:

if (m.taskingEvaluation !== 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT') continue

Which made the script useless for troubleshooting why an SP had a RSR smaller than 100%.

However, now that I am reading that line again, I think there is a bug. This is what I probably wanted to write in #442:

if (!(m.taskingEvaluation === 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT')) continue

We were including non-majority results before (see PR description), and those are important, for the non-consensus based scores. What I'm not understanding then is what should actually be changed?

When I wrote that issue, I was expecting to change the condition shown above to the following plus update the output to include information about the committee consensus:

if (m.taskingEvaluation !== 'OK') continue

See #396 for the very first iteration on that.

Having written the above, I like your ideas on how to revamp the output of this script. Maybe we can implement them on top of the fix too?

NikolasHaimerl · 2025-02-18T08:10:39Z

bin/evaluate-measurements.js

-    // See https://github.com/filecoin-station/spark-evaluate/pull/396
-    fields.push((m.taskingEvaluation === 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT' ? '🫡  ' : '🙅  '))
+  if (keepRejected) {
+    fields.push((m.taskingEvaluation === 'OK' ? '🫡' : '🙅').padEnd(7))


Do we include a --help command or similar which explains what emoji means what?

juliangruber · 2025-02-18T12:41:23Z

@bajtos got it! I took me a bit to realize that this is never what we want:

if (m.taskingEvaluation !== 'OK' && m.consensusEvaluation === 'MAJORITY_RESULT') continue

So, you're proposing to change the line to:

if (m.taskingEvaluation !== 'OK') continue

plus update the output to include information about the committee consensus

What do you mean by this? Would this be the final loop:

for (const m of round.measurements) {
    if (m.taskingEvaluation !== 'OK') continue
    resultCounts.total++
    const status = m.consensusEvaluation !== 'MAJORITY_RESULT'
        ? m.consensusEvaluation
        : m.retrievalResult
    resultCounts[status] = (resultCounts[status] ?? 0) + 1
}

bin: add include minority results in evaluate output. closes #458

e476c7e

juliangruber requested review from bajtos, pyropy and NikolasHaimerl as code owners February 17, 2025 11:04

juliangruber commented Feb 17, 2025

View reviewed changes

NikolasHaimerl reviewed Feb 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add include minority results in evaluate script output. Closes #458 #479

Add include minority results in evaluate script output. Closes #458 #479

juliangruber commented Feb 17, 2025

juliangruber Feb 17, 2025

juliangruber commented Feb 17, 2025

bajtos commented Feb 17, 2025

NikolasHaimerl Feb 18, 2025

juliangruber commented Feb 18, 2025 •

edited

Loading

Add include minority results in evaluate script output. Closes #458 #479

Are you sure you want to change the base?

Add include minority results in evaluate script output. Closes #458 #479

Conversation

juliangruber commented Feb 17, 2025

Before

After

juliangruber Feb 17, 2025

Choose a reason for hiding this comment

juliangruber commented Feb 17, 2025

bajtos commented Feb 17, 2025

NikolasHaimerl Feb 18, 2025

Choose a reason for hiding this comment

juliangruber commented Feb 18, 2025 • edited Loading

juliangruber commented Feb 18, 2025 •

edited

Loading