Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-importing the same report leaves the duplicates in status mitigated #3958

Open
2 tasks
ptrovatelli opened this issue Feb 28, 2021 · 7 comments
Open
2 tasks
Labels

Comments

@ptrovatelli
Copy link
Contributor

Bug description
When we import a report which includes itself duplicates, when re-importing the same report, the duplicates get mitigated.

  • first import:
    • 1 active, verified
    • 1 inactive, duplicate
  • after reimport:
    • 1 active, verified
    • 1 inactive, mitigated, duplicate

It doesn't seem correct to me. I'd expect to have :

  • 1 active, verified
  • 1 inactive, duplicate (same status as before the reimport as we have just re-imported the same report)

The problem is that when matching the new findings to the existing findings, we always match the new findings to the same single original finding (the one that's not duplicate). Consequently, the duplicates from the original report are flagged as mitigated because no new finding was matched against them.

i think the issue is here (serializers.py and test/views.py)

                if findings:
                    # existing finding found
                    finding = findings[0]

Intead of working on the first finding that matches, we should work on all of them (might need a bit of tuning in order not to re-save the same findings multiples times...)

Sample data: with checkmarx parser attached

Steps to reproduce
Steps to reproduce the behavior:

  • Import the report once
  • Re-import the same report
    Expected behavior
    The status after the re-import is the same as after the initial import

Deployment method (select with an X)

  • Kubernetes
  • [x ] Docker
  • setup.bash / legacy-setup.bash

Environment information

  • DefectDojo Commit Message:
$ git show -s --format="[%ci] %h: %s [%d]"
[2021-02-26 17:14:50 +0100] 4f789bc7: fix flake8, fix Q queries, add  unit tests [ (HEAD -> reimport-configurable-dedupe, myOrigin/reimport-configurable-dedupe)]

Sample scan files (optional)
See attached
checkmarx_duplicate_in_same_report.zip

Screenshots (optional)

Console logs (optional)

Additional context (optional)
I've found this while working on #3753 and the provided report will produce the issue after this is merged. The problem was present before PR 3753 but the test data provided might not replicate it.

@stale
Copy link

stale bot commented Jun 11, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@devGregA
Copy link
Contributor

devGregA commented Jul 2, 2021

It looks like this was fixed via #3753

@devGregA devGregA closed this as completed Jul 2, 2021
@devGregA
Copy link
Contributor

devGregA commented Jul 2, 2021

I was under the impression this was fixed, but may not be @valentijnscholten could you please confirm?

@devGregA devGregA reopened this Jul 2, 2021
@valentijnscholten
Copy link
Member

valentijnscholten commented Jul 3, 2021

This still happens when an import contains duplicates, i.e. 3 findings getting the same hash_code. On the next reimport of that report, there is no way to uniquely match the incoming 3 findings against the 3 existing one. So the 3 incoming findings will all get matched against the 1st existing finding with the same hash_code. This finding will remain open. The other 2 will be marked as mitigated as it appears they are no longer present in the report.

Usually it means the deduplication algorithm is not 100% correct for the scanner, or the parser doesn't do a good job around dupe_key deduplication. But still I think we should try to improve the situation as it appears very confusing currently. In the past we had matching on title, cve, cwe, etc. Maybe we should fall back to those if there are multiple existing findings with the same hash_code.

I think it's quite complicated for a 'good first issue' as one needs to fully understand all the details of import/reimport/hash_code, hope you don't mind me removing that label again.

jcaillon added a commit to jcaillon/django-DefectDojo that referenced this issue Dec 1, 2021
this should fix DefectDojo#3958
the aggregation mechanism and deduplication mechanism for checkmarx are now using the same fields
it now uses the query id of checkmarx in the hash code to avoid creating multiple issue for each checkmarx "result"
we keep the aggregation but now we can no longer find duplicates inside a single report
damiencarol pushed a commit that referenced this issue Dec 2, 2021
)

this should fix #3958
the aggregation mechanism and deduplication mechanism for checkmarx are now using the same fields
it now uses the query id of checkmarx in the hash code to avoid creating multiple issue for each checkmarx "result"
we keep the aggregation but now we can no longer find duplicates inside a single report
valentijnscholten added a commit to valentijnscholten/django-DefectDojo that referenced this issue Jan 2, 2022
StefanFl pushed a commit that referenced this issue Jan 5, 2022
* return stats for api (re)imports

* return stats for api (re)imports

* add total

* attempt model statistics

* remove model statistics

* finish + tests

* finish + tests

* cleanup

* remove migration

* fix UI import

* fix existings tests

* Revert "remove migration"

This reverts commit 0b7781e.

* make import history work around #3958

* fix mocking

* fix old tests

* rebase migration

* fix test after merging dev

* support TRACK_IMPORT_HISTORY=False
@stale
Copy link

stale bot commented Apr 16, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Apr 16, 2022
@stale stale bot closed this as completed Apr 28, 2022
@ptrovatelli
Copy link
Contributor Author

don't think it was fixed

@bmihaescu
Copy link

Any updates on this issue? Still annoying in 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants