-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
White lines though images on certain pdfs #3351
Comments
They seem related, yes. I can reproduce this using Windows 7 x64, Firefox Nightly 30.0a1 (HWA on, 64 bit) and the latest PDF.js development version. I heard from @yurydelendik that it looks better on OS X for some reason, but the main problem seems to be that the math for adding the image chunks together is a bit off. |
This is one of the more common issues I experience with this viewer. I can probable add more example pdfs if you need. |
Thank you. We also have #3331, but adding more example PDFs here is always welcome. |
Wow, thanks for the good test cases! I haven't checked all of them, but I could reproduce the ones I did check. Regarding the black background: this is the first of those issues that I can actually reproduce on Windows. It might be best to just add that file with your configuration details (OS name and version, browser name and version, PDF.js version) at the existing issue. |
Updated Dropbox links |
Here's another example. But it includes a solution. To fix it up, I got the original image, resized it and paste it into the document that I made the pdf with. I was using MSword, just changing the size in msword didn't help. But resizing the original image and then pasting it again fixed it. Below is an example of two different images resized... http://www.pdf-archive.com/2015/03/07/test2200-2/test2200.pdf I made the pdf with msword, pdfcreator v2.0.2 This problem doesn't happen in Google Chrome. But it happens in Firefox, and the evince app in linux. |
Has there been any progress made regarding this issue? Could someone point me in the right direction in regards to where PDF.JS begins to attach each object of the pdf onto the canvas? I assume the issue is caused by incorrectly stitching multiple images together. In the first sample PDF provided, the main image is actually composed of 7 smaller/sliced images. |
I think the easiest way to debug this is to use |
I have looked into this and I think I have found the problem, although, I am not 100% sure how drawing works when width is more accurate than pixels can account for, but i have an idea. It appears that when calling .drawImage on our canvas object, we do so given the exact height and width of the object that we wish to draw (image, rectangle, etc). By doing this, without taking pixel width into account, it seems that we are leaving some pixels partially colored. When stitching multiple parallel images together, this leaves a visible gap between them. Ex. Given a pixel width of 0.5 and an image with width 1.02, we are coloring 2 full pixels and then leaving the 3rd pixel partially transparent. The next image to be stitched then starts at the 4th pixel, leaving the gap that the 3rd pixel has generated. Using this sample file (CompWiz_Page1.pdf), which draws many rectangles on the canvas to create a gradient effect, I have came up with the following solution. In Canvas.constructPath(), we are creating the rectangle's that we wish to attach to our canvas. In order to account for Pixel Size, I made the following modifications (see comments): BEFORE constructPath: function CanvasGraphics_constructPath(ops, args) {
var ctx = this.ctx;
var current = this.current;
var x = current.x, y = current.y;
for (var i = 0, j = 0, ii = ops.length; i < ii; i++) {
switch (ops[i] | 0) {
case OPS.rectangle:
x = args[j++];
y = args[j++];
var width = args[j++];
var height = args[j++];
if (width === 0) {
width = this.getSinglePixelWidth();
}
if (height === 0) {
height = this.getSinglePixelWidth();
}
var xw = x + width;
var yh = y + height; AFTER constructPath: function CanvasGraphics_constructPath(ops, args) {
var ctx = this.ctx;
var current = this.current;
var x = current.x, y = current.y;
var origX, origY, pixelSize = this.getSinglePixelWidth();
for (var i = 0, j = 0, ii = ops.length; i < ii; i++) {
switch (ops[i] | 0) {
case OPS.rectangle:
origX = args[0];
origY = args[1];
//Round down to the nearest multiple of our pixel width
x = Math.floor(args[j++]/pixelSize)*pixelSize;
y = Math.floor(args[j++]/pixelSize)*pixelSize;
var width = args[j++];
var height = args[j++];
if (width === 0) {
width = this.getSinglePixelWidth();
}
if (height === 0) {
height = this.getSinglePixelWidth();
}
//Use the non rounded original x/y values to round up to nearest pixel width multiple.
var xw = Math.ceil((origX + width)/pixelSize)*pixelSize;
var yh = Math.ceil((origY + height)/pixelSize)*pixelSize; Doing this ensures that each pixel is fully colored and there is no transparency between stitched images. I was able to fix the line issue that Doc2 (first example file attached in OP comment) has with his image using the same logic and modifying Canvas.paintJpegXObject() and Canvas.paintInlineImageXObject(). The following lines were added before writing the images to the canvas: var pixelSize = this.getSinglePixelWidth();
h = Math.ceil(h/pixelSize)*pixelSize;
w = Math.ceil(w/pixelSize)*pixelSize; If this is a viable solution, I think that it would need to be implemented before each drawing interaction with our canvas. If that is the case, I'd be happy to work on its implementation. |
Please note that that particular method may be "expensive" (performance wise), so using it everywhere would probably require very careful benchmarking; please refer to e.g. the discussion starting in #4615 (comment) and the results in #4615 (comment). |
If you're interested in benchmarking this, you can use our benchmarking tool as described on https://github.com/mozilla/pdf.js/wiki/Benchmarking-your-changes. |
Interesting, I just got caught up on the topic, I was unaware that this solution has already been thought about. Is it possible to see at what number of calls to .getSinglePixelWidth() we start to see a regression in performance? I am not familiar with how the #pdfbug=Stepper works, but if pdf.js has access to the load operations and can count the number of calls to Restore/Transform/ShowText (the methods resetting our cache), we can determine the amount of times we would need to call .getPixelWidth(). If that value falls under our performance regression threshold, we can proceed to load the document with accurate pixel measurements. Something like this would allow for higher quality document display in standard cases while avoiding the performance regression seen in extreme cases. |
I think the stepper should provide you with that. It shows you when and in which order it performs a drawing operation, for example restore/transform since they are native PDF commands. |
That sounds really difficult, since it seems like it'd be highly dependant on e.g. the complexity and the ordering of rendering OPS of particular PDF files; not to mention dependent on general computer/device performance.
There would be overhead/complexity related to counting the operations as well, especially considering that only some ShowText commands are resetting the
That seems immediately problematic, since it would introduce inconsistent rendering behaviour for different kinds of PDF files. (Some files rendering correctly, and others not, doesn't seem like great UX.) It might happen that it's deemed OK to make use of In the past, smaller regressions have been accepted in return for significant rendering improvements (in multiple files); but without actual benchmarking results[1], especially with larger and more complex PDF files, it's mostly speculation at this point :-) I didn't mean to deter you from looking into this with #3351 (comment); sorry if it came off that way! Finally, one piece of advice (based on experience): Before you put a lot of time and effort into doing multiple benchmark runs, make sure that you run the entire test-suite first to check that there aren't any obvious regressions from your changes.[2] [1] One small advice regarding https://github.com/mozilla/pdf.js/wiki/Benchmarking-your-changes. The example contains [2] There's nothing worse than having to re-do entire benchmark runs just because of some simple error in code you wrote yourself. Don't ask me how I know this ;-) |
Thank you for the through reply. I have started to mess around with the benchmark testing. Are there specific files which are noteworthy? Or is it best to run the benchmark on the whole list of test PDF's? |
I think the PDF file in #2813 may be a good one for your benchmark since it's a complicated PDF with a lot of drawing operations. Other than the PDF files in this issue, I think taking a subset of let's say 20 random PDF files from the existing suite should be sufficient. Perhaps @Snuffleupagus has more ideas though. |
Here are the results when testing 50 rounds on the above document:
What exactly is it that I am looking for from these results? Is there a rule of thumb as to what range of values from the baseline are acceptable? |
Usually we look at the t-test column |
I ran the benchmark 2 more times and above are the results. It seems strange to me that they differ so much from one another. Is this normal? Here is the resulting json from the last current run (Renamed to .txt because .json uploading is not supported). |
Yes, benchmarking browser's can be very fickle. One issue with the current benchmark setup is that it basically just re-runs rendering a page many times in the same browser page, which is not a realistic scenario in the real world (usually we'll just render a page once, so first time paint is the most important). What we really should probably do is restart the browser on each round or run reach round under a different domain so the caches are clean. This is all beyond the scope of this bug, but if you're interested in working on it, I'd be happy to help. Just to be clear on your above results, did you run it so |
That is a great point about the testing. After this is solved, I may dive into that to see what can be done. For testing: For each test, I regenerated both the baseline and current results. |
There are different issues here:
The underlying problems are slightly different. About images, we must draw them at integer coordinates and with integer dimensions, according to MDN, it should help to improve performance and in the pdf context it should remove the thin lines we have between two tiles. About paths, the situation is slightly different, after apply current transformation we must round and add 0.5. The extra transparency added around lines can lead to some wrong color rendering with very close thin lines see bug 1657838. |
…la#3351) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1264608; - it's only a partial fix for mozilla#3351; - some tiled images have some spurious white lines between the tiles. When the current transform is applyed the corners of an image can have some non-integer coordinates leading to some extra transparency added to handle that. So with this patch the current transform is applied on the point and on the dimensions in order to have at the end only integer values.
…la#3351) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1264608; - it's only a partial fix for mozilla#3351; - some tiled images have some spurious white lines between the tiles. When the current transform is applyed the corners of an image can have some non-integer coordinates leading to some extra transparency added to handle that. So with this patch the current transform is applied on the point and on the dimensions in order to have at the end only integer values.
…la#3351) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1264608; - it's only a partial fix for mozilla#3351; - some tiled images have some spurious white lines between the tiles. When the current transform is applyed the corners of an image can have some non-integer coordinates leading to some extra transparency added to handle that. So with this patch the current transform is applied on the point and on the dimensions in order to have at the end only integer values.
…la#3351) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1264608; - it's only a partial fix for mozilla#3351; - some tiled images have some spurious white lines between the tiles. When the current transform is applyed the corners of an image can have some non-integer coordinates leading to some extra transparency added to handle that. So with this patch the current transform is applied on the point and on the dimensions in order to have at the end only integer values.
Use integer coordinates when drawing images (bug 1264608, issue #3351)
…la#3351) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1264608; - it's only a partial fix for mozilla#3351; - some tiled images have some spurious white lines between the tiles. When the current transform is applyed the corners of an image can have some non-integer coordinates leading to some extra transparency added to handle that. So with this patch the current transform is applied on the point and on the dimensions in order to have at the end only integer values.
…la#3351) - it aims to fix https://bugzilla.mozilla.org/show_bug.cgi?id=1264608; - it's only a partial fix for mozilla#3351; - some tiled images have some spurious white lines between the tiles. When the current transform is applyed the corners of an image can have some non-integer coordinates leading to some extra transparency added to handle that. So with this patch the current transform is applied on the point and on the dimensions in order to have at the end only integer values.
This comment was marked as off-topic.
This comment was marked as off-topic.
Another way to reproduce this bug, but with rectangles again instead of images;
The Word doc renders the background of the pasted text as a solid color, and the resulting PDF looks correct in Adobe and Chrome, but Firefox is bleeding through some of the background color in between each line of the code. |
On pdfs where the creation software splits the image into smaller chunks, pdf.js will produce a white line between them while adobe reader renders them correctly.
Example pdf:
https://www.dropbox.com/s/gt1b2y9bgf06pam/Doc2.pdf
Edit: may be related to issue #3331 except with images instead of filled rectangles.
The text was updated successfully, but these errors were encountered: