Re-importing the same report leaves the duplicates in status mitigated #3958

ptrovatelli · 2021-02-28T21:22:39Z

Bug description
When we import a report which includes itself duplicates, when re-importing the same report, the duplicates get mitigated.

first import:
- 1 active, verified
- 1 inactive, duplicate
after reimport:
- 1 active, verified
- 1 inactive, mitigated, duplicate

It doesn't seem correct to me. I'd expect to have :

1 active, verified
1 inactive, duplicate (same status as before the reimport as we have just re-imported the same report)

The problem is that when matching the new findings to the existing findings, we always match the new findings to the same single original finding (the one that's not duplicate). Consequently, the duplicates from the original report are flagged as mitigated because no new finding was matched against them.

i think the issue is here (serializers.py and test/views.py)

                if findings:
                    # existing finding found
                    finding = findings[0]

Intead of working on the first finding that matches, we should work on all of them (might need a bit of tuning in order not to re-save the same findings multiples times...)

Sample data: with checkmarx parser attached

Steps to reproduce
Steps to reproduce the behavior:

Import the report once
Re-import the same report
Expected behavior
The status after the re-import is the same as after the initial import

Deployment method (select with an X)

Kubernetes
[x ] Docker
setup.bash / legacy-setup.bash

Environment information

DefectDojo Commit Message:

$ git show -s --format="[%ci] %h: %s [%d]"
[2021-02-26 17:14:50 +0100] 4f789bc7: fix flake8, fix Q queries, add  unit tests [ (HEAD -> reimport-configurable-dedupe, myOrigin/reimport-configurable-dedupe)]

Sample scan files (optional)
See attached
checkmarx_duplicate_in_same_report.zip

Screenshots (optional)

Console logs (optional)

Additional context (optional)
I've found this while working on #3753 and the provided report will produce the issue after this is merged. The problem was present before PR 3753 but the test data provided might not replicate it.

The text was updated successfully, but these errors were encountered:

stale · 2021-06-11T01:32:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

devGregA · 2021-07-02T20:53:35Z

It looks like this was fixed via #3753

devGregA · 2021-07-02T21:26:55Z

I was under the impression this was fixed, but may not be @valentijnscholten could you please confirm?

valentijnscholten · 2021-07-03T07:49:28Z

This still happens when an import contains duplicates, i.e. 3 findings getting the same hash_code. On the next reimport of that report, there is no way to uniquely match the incoming 3 findings against the 3 existing one. So the 3 incoming findings will all get matched against the 1st existing finding with the same hash_code. This finding will remain open. The other 2 will be marked as mitigated as it appears they are no longer present in the report.

Usually it means the deduplication algorithm is not 100% correct for the scanner, or the parser doesn't do a good job around dupe_key deduplication. But still I think we should try to improve the situation as it appears very confusing currently. In the past we had matching on title, cve, cwe, etc. Maybe we should fall back to those if there are multiple existing findings with the same hash_code.

I think it's quite complicated for a 'good first issue' as one needs to fully understand all the details of import/reimport/hash_code, hope you don't mind me removing that label again.

this should fix DefectDojo#3958 the aggregation mechanism and deduplication mechanism for checkmarx are now using the same fields it now uses the query id of checkmarx in the hash code to avoid creating multiple issue for each checkmarx "result" we keep the aggregation but now we can no longer find duplicates inside a single report

) this should fix #3958 the aggregation mechanism and deduplication mechanism for checkmarx are now using the same fields it now uses the query id of checkmarx in the hash code to avoid creating multiple issue for each checkmarx "result" we keep the aggregation but now we can no longer find duplicates inside a single report

* return stats for api (re)imports * return stats for api (re)imports * add total * attempt model statistics * remove model statistics * finish + tests * finish + tests * cleanup * remove migration * fix UI import * fix existings tests * Revert "remove migration" This reverts commit 0b7781e. * make import history work around #3958 * fix mocking * fix old tests * rebase migration * fix test after merging dev * support TRACK_IMPORT_HISTORY=False

stale · 2022-04-16T10:36:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ptrovatelli · 2022-04-29T08:19:59Z

don't think it was fixed

bmihaescu · 2024-01-17T17:18:02Z

Any updates on this issue? Still annoying in 2024

ptrovatelli added the bug label Feb 28, 2021

ptrovatelli mentioned this issue Feb 28, 2021

Reimport feature: use the configurable deduplication for matching new findings to existing findings #3753

Merged

stale bot added the stale label Jun 11, 2021

valentijnscholten mentioned this issue Jun 18, 2021

Re-import the same scan creates and closes findings #4663

Closed

1 task

devGregA added the good first issue label Jul 2, 2021

stale bot removed the stale label Jul 2, 2021

devGregA closed this as completed Jul 2, 2021

devGregA reopened this Jul 2, 2021

valentijnscholten removed the good first issue label Jul 3, 2021

jcaillon mentioned this issue Nov 26, 2021

Checkmarx parser aggregation and deduplication with query id #5506

Merged

Maffooch closed this as completed in eeaa1ba Dec 7, 2021

valentijnscholten reopened this Dec 8, 2021

valentijnscholten added a commit to valentijnscholten/django-DefectDojo that referenced this issue Jan 2, 2022

make import history work around DefectDojo#3958

71c8ba4

valentijnscholten mentioned this issue Jan 2, 2022

API: return stats for api (re)imports #5635

Merged

stale bot added the stale label Apr 16, 2022

stale bot closed this as completed Apr 28, 2022

ptrovatelli reopened this Apr 29, 2022

stale bot removed the stale label Apr 29, 2022

valentijnscholten mentioned this issue Jul 14, 2023

Findings Closed as "Mitigated / Inactive" after Reimport of identical results.json #8100

Closed

farsheedify mentioned this issue Dec 18, 2024

Deduplication Bug in Semgrep JSON Report Causing Mitigation of the Original Finding #11227

Closed

dziewxc mentioned this issue Jan 5, 2025

Match all unchanged findings on reimport #11505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-importing the same report leaves the duplicates in status mitigated #3958

Re-importing the same report leaves the duplicates in status mitigated #3958

ptrovatelli commented Feb 28, 2021

stale bot commented Jun 11, 2021

devGregA commented Jul 2, 2021

devGregA commented Jul 2, 2021

valentijnscholten commented Jul 3, 2021 •

edited

Loading

stale bot commented Apr 16, 2022

ptrovatelli commented Apr 29, 2022

bmihaescu commented Jan 17, 2024

Re-importing the same report leaves the duplicates in status mitigated #3958

Re-importing the same report leaves the duplicates in status mitigated #3958

Comments

ptrovatelli commented Feb 28, 2021

stale bot commented Jun 11, 2021

devGregA commented Jul 2, 2021

devGregA commented Jul 2, 2021

valentijnscholten commented Jul 3, 2021 • edited Loading

stale bot commented Apr 16, 2022

ptrovatelli commented Apr 29, 2022

bmihaescu commented Jan 17, 2024

valentijnscholten commented Jul 3, 2021 •

edited

Loading