Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds support for removing formatting when copying and pasting from word into article abstract and other fields #3983

Merged
merged 26 commits into from
Mar 12, 2024

Conversation

mauromsl
Copy link
Member

@mauromsl mauromsl commented Feb 20, 2024

closes #1481
closes #3896
closes #3904
closes #3868
closes #3905

A new approach that should work seamlessly when formatted text is detected on a paste event.

  • Implements TinyMCE as WYSIWYG editor
  • TinyMCE presents a prompt to ask the user if they want to clean their input when an HTML paste is detected.
  • Django Bleach used by the backend to remove any harmful rich-text that could lead to XSS attacks
  • Limited the scope to text fields originally covered in Support copy-paste in rich-text fields and fine-tune editing window #3446
  • Configured TinyMCE to closely resemble features available in SummerNote

For most fields, using JanewayBleachField on the model declaration should be enough to get TinyMCE widget in the frontend + bleach on the backend. There are a few exceptions such as autogenerated forms (e.g peer review forms or GeneratedSettingForm)

@mauromsl mauromsl requested review from joemull and ajrbyers February 20, 2024 17:55
@mauromsl mauromsl linked an issue Feb 22, 2024 that may be closed by this pull request
Copy link
Member

@ajrbyers ajrbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted a few issues where fields that traditionally were just text areas have now been converted to rich-text and vice versa. I was also able to paste inputs without them being stripped:

image

Copy link
Member

@joemull joemull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mauro. I think we just need a few changes here and I'll be happy to approve this. I did add some comments below so you know what my thought process is, since this ties into our previous discussions, and I don't want to ignore the context.

The best thing about this PR is it separates the security issue (injection) from the usability issue (clean copy-paste). Thanks for making that happen.

Moving away from Summernote also promises to be a huge step forward for the usability of text fields in Janeway. I can't be sure in reviewing this PR that we are not introducing further bugs. But good that we are moving.

Moving to TinyMCE is a solid first step, and I personally really like this editor. It may not be what we need long term, especially if we build features that allow authoring articles in Janeway. In that case we may want to consider other editors. I did a comparison when writing #3447 last year and CKEditor was one I thought might be promising. I'm sure you looked at it as well. But point being, in the future I'd like to make this decision more collaboratively and deliberately than we could do here, with the constraints we had.

In terms of what TinyMCE offers for clean pasting, the user needs we established in #1481 and #2282 (comment) are still not fully met. The average humanities scholar will want to keep a few basics like links, italics and bolding even if they do not want MS Word's full inline style properties. The latter are usually what they are trying to get rid of when they ask for clean paste. The former are usually a big pain to lose, because you have to re-copyedit the text and re-add italics for titles of works and emphasis, as well as any hyperlinks. So in the future we should consider more precise paste functionality in the modal, along with the other changes to the visual appearance of the modal that we've discussed. There are advanced paste options with TinyMCE but they cost money.

In terms of scope, you say this is limited to models and forms originally touched by the first bleach branch, but there are a lot of new ones added, like in the review app. Also there are a lot left out that still use JQTE, like in copyediting. Could you just say why you included the ones you did, and why you left others? That will help when cleaning this up and getting rid of JQTE and Summernote entirely, when we get to that.

Last general comment: it is becoming clearer and clearer to me that we need a better way of managing JS dependencies that does not add their full source code to our Git repositories. The third-party CSS and JS files in our static folders are quickly becoming unmanageable. This PR adds 10 MB and 100k+ lines of code we did not write that is available through commonly used JS package managers. I recognise this is a separate issue but wanted to flag it here because TinyMCE is a good example of something that really should be versioned and managed better. Let's discuss this soon, OK?

I left a few comments on changes inline but also noticed a few omissions that I'll list here:

  • The documentation needs updating to reflect the different paste behavior:

https://github.com/BirkbeckCTP/janeway/blob/abe68bdb174f23d4da8a599602104e5d836aed96/docs/source/manager/content/index.rst?plain=1#L259-L274

  • This form change can be reverted, because I think commit=True was just for the bleach mixin:

https://github.com/BirkbeckCTP/janeway/blame/1481-clean-rich-text/src/comms/forms.py#L15

But now that I look at that form closer, I think it might have some bugs around saving, because "commit" is not respected. Might need a separate bug report?

  • This can go:

https://github.com/BirkbeckCTP/janeway/blob/abe68bdb174f23d4da8a599602104e5d836aed96/src/templates/admin/cms/page_manage.html#L31

  • This can go:

https://github.com/BirkbeckCTP/janeway/blob/abe68bdb174f23d4da8a599602104e5d836aed96/src/repository/forms.py#L492

  • This can go:

https://github.com/BirkbeckCTP/janeway/blob/abe68bdb174f23d4da8a599602104e5d836aed96/src/templates/admin/core/manager/news/index.html#L61

@ajrbyers
Copy link
Member

ajrbyers commented Feb 27, 2024

Could tinymce pass the paste to something like sanitize-html to get some clean HTML back rather than pasting it without any formatting at all?

@mauromsl
Copy link
Member Author

mauromsl commented Feb 27, 2024

Hey @joemull , thanks for the detailed review. Here are some replies to your general comments

Last general comment: it is becoming clearer and clearer to me that we need a better way of managing JS dependencies that does not add their full source code to our Git repositories. The third-party CSS and JS files in our static folders are quickly becoming unmanageable. This PR adds 10 MB and 100k+ lines of code we did not write that is available through commonly used JS package managers. I recognise this is a separate issue but wanted to flag it here because TinyMCE is a good example of something that really should be versioned and managed better. Let's discuss this soon, OK?

Happy to raise an issue for this? I don't think we have one yet. It makes sense to me to find a way to make a smaller package for production pipelines.

Moving to TinyMCE is a solid first step, and I personally really like this editor. It may not be what we need long term, especially if we build features that allow authoring articles in Janeway. In that case we may want to consider other editors. I did a comparison when writing #3447 last year and CKEditor was one I thought might be promising. I'm sure you looked at it as well. But point being, in the future I'd like to make this decision more collaboratively and deliberately than we could do here, with the constraints we had.

I did look into CKEditor (amongst others). I chose TinyMCE because the documentation was far better than the others and made it possible to install the paste event callback in the right place between HTML sanitization and confirmation. The Django package is also well documented and supported. Ultimately, CKEditor did provide most of the same functionality and shouldn't be cause too much trouble to swap it in place of TinyMCE once we get to #3447

In terms of what TinyMCE offers for clean pasting, the user needs we established in #1481 and #2282 (comment) are still not fully met. The average humanities scholar will want to keep a few basics like links, italics and bolding even if they do not want MS Word's full inline style properties. The latter are usually what they are trying to get rid of when they ask for clean paste. The former are usually a big pain to lose, because you have to re-copyedit the text and re-add italics for titles of works and emphasis, as well as any hyperlinks. So in the future we should consider more precise paste functionality in the modal, along with the other changes to the visual appearance of the modal that we've discussed. There are advanced paste options with TinyMCE but they cost money.

Yes, using the powerpaste plugin would be ideal (and use and/or users of Janeway willing to pay for that plugin can make use of it using custom settings.py alone). I want to make sure the default use case in this PR solves the current problem, but it makes sense to have a better UI and smarter paste options in the future.

In terms of scope, you say this is limited to models and forms originally touched by the first bleach branch, but there are a lot of new ones added, like in the review app.

My bad, I also added BleachField to fields that did not have a rich text editor before, however they were being rendered as SafeStrings elsewhere in the codebase. These are still vectors for XSS attacks. We could have the same problem coming form the JQTE/Summernote fields you mentioned, so I think it is best to also change those in this PR.

I'll work on the remaining issues flagged inline and request another review

@mauromsl mauromsl force-pushed the 1481-clean-rich-text branch from abe68bd to 0bdab52 Compare March 12, 2024 10:31
@mauromsl mauromsl requested review from ajrbyers and joemull March 12, 2024 12:16
Copy link
Member

@ajrbyers ajrbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviwed the additional commits. I have some concerns about how the migrations will handle especially for sites where we've already deployed an RC of 1.5.1, so this will require some testing.

Two comments inline there for you to take a look at.

@mauromsl mauromsl requested a review from ajrbyers March 12, 2024 12:39
@mauromsl
Copy link
Member Author

I've reviwed the additional commits. I have some concerns about how the migrations will handle especially for sites where we've already deployed an RC of 1.5.1, so this will require some testing.

Two comments inline there for you to take a look at.

All the RCs for 1.5.1 live within the same branch release branch b_1_5_1, the migration history should be linear and consistent.

Copy link
Member

@joemull joemull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional changes look good to me. Just one thing I noticed: I don't think #3188 is affected by this PR but it is marked as being closed by it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Auto-remove erroneous formatting for abstracts in metadata
3 participants