Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not store the hypothetically produced mime-types always #3583

Merged
merged 7 commits into from
Sep 27, 2024

Conversation

Donnype
Copy link
Contributor

@Donnype Donnype commented Sep 26, 2024

Changes

This is a bug introduced during the create/copy boefje feature, where I needed to start searching for non-local plugins in the job handler. Here I joined - although still unclear to my why I did this - the plugin.produces mime-types always, but this is a large list for only the webpage-analysis boefje, meaning we get issues where the saved files had the same big list of mime-types, Bytes deduplicating on the mime-type set as expected. I realised this at one point but perhaps didn't see the harm. The irony.

Issue link

Closes #3570

Demo

QA notes

Should resolve the issue described in #3570


Code Checklist

  • All the commits in this PR are properly PGP-signed and verified.
  • This PR only contains functionality relevant to the issue.
  • I have written unit tests for the changes or fixes I made.
  • I have checked the documentation and made changes where necessary.
  • I have performed a self-review of my code and refactored it to the best of my abilities.

Checklist for code reviewers:

Copy-paste the checklist from the docs/source/templates folder into your comment.


Checklist for QA:

Copy-paste the checklist from the docs/source/templates folder into your comment.

@Donnype Donnype requested a review from a team as a code owner September 26, 2024 10:25
@Donnype Donnype changed the title Do not store the hypothetically produced mimetypes always Do not store the hypothetically produced mime-types always Sep 26, 2024
@Donnype Donnype self-assigned this Sep 26, 2024
@Donnype Donnype added bug Something isn't working boefjes Issues related to boefjes labels Sep 26, 2024
@noamblitz
Copy link
Contributor

Im missing the default mimetypes now: "mime_types": [{"value": "openkat-http/response"}]. I would expect boefje/webpage-analysis to also be available as mimetype.

@Donnype
Copy link
Contributor Author

Donnype commented Sep 26, 2024

@noamblitz, strange, I do have the default mime-types:
image

@Donnype Donnype changed the title Do not store the hypothetically produced mime-types always [DO NOT MERGE] Do not store the hypothetically produced mime-types always Sep 26, 2024
@noamblitz
Copy link
Contributor

Boefjes that do not return mimetypes in their boefje now also do not seem to stop running and fail after a while.

@noamblitz
Copy link
Contributor

@noamblitz, strange, I do have the default mime-types: image

Yeah these are the ones that you get when boefjes fail right.

@Donnype Donnype changed the title [DO NOT MERGE] Do not store the hypothetically produced mime-types always Do not store the hypothetically produced mime-types always Sep 26, 2024
@noamblitz
Copy link
Contributor

Seems to work corectly now again!

Copy link
Contributor

@noamblitz noamblitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@ammar92 ammar92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No remarks

@underdarknl
Copy link
Contributor

Bytes deduplicating on the mime-type set as expected.

This is not What I'd expect to be honest. If a boefje creates 4 raw files, all with the same mime-types (or none). I'd expect there to be 4 raw files attached to the boefje-job.
Granted, this does make it hard to differentiate those raw files in the UX, as they dont have a name, but thats a problem for another day.

@TwistMeister
Copy link
Contributor

QA: I didn't see anything weird, and found no errors in the normalizer container logs

@underdarknl underdarknl merged commit 3898e26 into main Sep 27, 2024
16 checks passed
@underdarknl underdarknl deleted the fix/do-not-add-produces-mime-types branch September 27, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
boefjes Issues related to boefjes bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Parsing webpage analysis normalizer json decoding error
5 participants