-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy text selection from PDF adds extra line breaks where they don't belong #7833
Comments
Changing the absolutely positioned div's to span's would help text selection. |
Just chiming in that this is not an issue with Chrome PDF viewer but is within Firefox. Proving difficult for at least a few users over here! |
Hello, I read scientific literature mainly in the form of PDFs and copy-pasting quotes from PDFs opened in Firefox and find my work greatly and surprisingly negatively affected by this issue. It's impossible to copy-paste anything without both line breaks and dashes everywhere. It's not the exception, it's virtually every scientific PDF, all the time, everywhere. Example, a simple CTRL+C then CTRL+V into a plain text editor:
You can see the following 3 issues, which intertwine but you can separate them:
The only arguably sane behavior for most purposes is 3, however even 3 desperately requires at least an option to not insert line breaks into paragraphs, so that you can copy it without artificial formatting. What it should do, at least have a configurable option to allow, is: If you copy-paste 2 subsequent paragraphs from a PDF, it should produce a total of 3 line breaks maximum: 1 or 2 line breaks after the first paragraph, and possibly another after the second although I'm not sure about that one, maybe not. Note that Chromium only suffers from issue 3 for the most part, but due to setup and plugins find myself relying on Firefox so this is quite a major setback and I physically waste time due to this. All versions of Firefox I've tried recently are affected by this (60-64), and I use exclusively under Linux. Thank you |
Pull request #7834 is a first attempt to resolve this, however it was never completely finished. It can be used as a baseline for a follow-up patch. |
Would love to see this fixed. It's really impacting our users: https://getpolarized.io/ whom edit a lot of scientific PDFs |
Facing same issue. Mozilla shows extra newlines while chrome removes even the one which are visible. Any solution please? |
Can confirm that this is an issue with the generation of the pdf and not pdf.js. We have the same problem regardless of the viewer used. |
Patch #13257 mostly fixes the issue but there is always a problem on the bottom of page 2 (note 1). |
Link to PDF file (or attach file here):
http://research.microsoft.com/en-us/um/people/hiballan/pubs/ccs08-staledns.pdf
Configuration:
Steps to reproduce the problem:
2.Try copying text from the article (I've just tried on the first page)
What is the expected behavior? (add screenshot)
Copy the text from the pdf with spaces between words: EG first line in paper should copy as
This paper considers DoS attacks on DNS wherein attackers flood
What went wrong? (add screenshot)
This is what is copied
This
paper
consider
s DoS
attac
ks
on
DNS
wher
ein
attac
kers flood
Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension): Reproduced this using the github hosted version.
I have a pull request to fix this problem in the works.
The text was updated successfully, but these errors were encountered: