Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add types annotations #12102

Merged
merged 1 commit into from
Aug 2, 2020
Merged

Conversation

ineiti
Copy link
Contributor

@ineiti ineiti commented Jul 18, 2020

Follow-up of #10575:

  • merged oBusk's and tamuratak's PRs
  • fixed merge conflicts (hopefully correctly)
  • added an Array type for a minimum example

I build it using:

npm ci
gulp generic
npx tsc -p .

And then trying to run the example from https://github.com/ineiti/pdf_example - I hope that's not too complex example to start from...

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need a Gulp task here to generate the type annotations like in the other WIP PR because it should be automated and end up in each release. Another question I have is: given that we're not TypeScript users here, how can we be sure that the generated type annotations are valid (not broken by any changes we make)? Is there a validator for that that we can use?

@ineiti
Copy link
Contributor Author

ineiti commented Jul 19, 2020

How can we be sure that the generated type annotations are valid (not broken by any changes we make)? Is there a validator for that that we can use?

Hmm - the only idea I have is to create some smoke-screen tests that make sure all the types are still correct and can be at least compiled. Because once tsc compiles a code, it means that the types are OK.

@ineiti
Copy link
Contributor Author

ineiti commented Jul 19, 2020

We do need a Gulp task here to generate the type annotations like in the other WIP PR because it should be automated and end up in each release.

I'm new to gulp, so I'd need some guidance here. Honestly, I was never able to understand where the different versions of the project come from.

Also, it's the first time I see the idea of converting jsdoc to ts-types ;) So any help from @oBusk and @tamuratak is very welcome! I'm currently not even sure how to merge all the *.d.ts files into one pdf.d.ts file - is it enough to concatenate all of them?

Is there some documentation that describes how the pdfjs-dist build is done? I stopped at trying to understand where the PDFJSDev.eval( comes from...

@ineiti
Copy link
Contributor Author

ineiti commented Jul 19, 2020

I'm not sure I'm doing the right thing here ;) But this nearly works:

$ npx tsc -p .
src/display/text_layer.js:21:1 - error TS9005: Declaration emit for this file requires using private name 'TextLayerRenderTask'. An explicit type annotation may unblock declaration emit.

21 import {
   ~~~~~~

It complains about the TextLayerRenderTask definition, which is kept private in text_layer.js, which looks reasonable that it's a problem. But I don't know how to solve it... Any idea?

If I remove

   * @returns {TextLayerRenderTask}

in text_layer.js, it compiles and creates a pdf.d.ts file which looks somewhat reasonable. Besides all the returns of Object, which are mostly useless, e.g., getDocument returns an Object.

So there's still a lot of work to do.

@ineiti
Copy link
Contributor Author

ineiti commented Jul 20, 2020

So this is starting to be nearly usable ;) After a

gulp dist
npx tsc -p .

and then importing the local pdf.js in my project using tsconfig.app.json-paths, it works! So I think I 'just' need to do the following (any help is welcome):

  • clean up the JSDoc and export everything through pdf.js
  • add the typescript-definitions to the dist gulp-target - @timvandermeij is this the correct target?
  • decide if tsc or jsdoc is better. For the moment, tsc leads - also tried gulp-typescript, but due to Option to emit referenced files. ivogabe/gulp-typescript#190 didn't work
  • fix the private export of display/text_layer::TextLayerRenderTask and api::PDFDocumentLoadingTasks
  • decide if pdf.js should be the entry-point, or if I should add an index.js that exports everything - only use pdf.js

@timvandermeij
Copy link
Contributor

I think the Gulp task was already made in https://github.com/mozilla/pdf.js/pull/10575/files and should already take care of bundling it in the distribution correctly.

@ineiti
Copy link
Contributor Author

ineiti commented Jul 20, 2020

OK - this latest commits add the dts gulp-target. Instead of using jsdoc from @oBusk , it uses tsc for creation of the types.

I chose to focus on pdf.d.ts, because that allows you to do the following:

import {getDocument, GlobalWorkerOptions, PDFDocumentProxy, PDFPageProxy} from "pdfjs-dist";
import pdfjsWorker from "pdfjs-dist/build/pdf.worker.entry";
GlobalWorkerOptions.workerSrc = pdfjsWorker;

But this means that all types will need to be exported by pdf.js. If somebody thinks this is a bad idea, please speak up and tell me why ;)

@ineiti ineiti force-pushed the add_types_annotations branch from f75cbf3 to 5a8a6f0 Compare July 20, 2020 19:31
Copy link
Collaborator

@Snuffleupagus Snuffleupagus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once you're getting this closer to done, please remember https://github.com/mozilla/pdf.js/wiki/Squashing-Commits
(Also, note that it's possible to list multiple authors of a commit, see e.g. here.)

@ineiti
Copy link
Contributor Author

ineiti commented Jul 21, 2020

One commit a day keeps the doctor away... I included all comments from @Snuffleupagus , except those I'm not sure what to do. Currently you can change the

TYPES_INT = "tsc"; // or "jsdoc"

in gulpfile.js, and then do

gulp types

or

gulp dist

My TODO-list from above didn't get shorter, unfortunately. And while JSDoc seems to give better documentation, I cannot use it. TSC gives me better documentation, but currently some of it is missing :(

And yes, commits will be squashed, co-authors added. But I feel it's still some way to go...

@ineiti ineiti force-pushed the add_types_annotations branch from 0ece522 to 4388616 Compare July 21, 2020 16:44
@ineiti
Copy link
Contributor Author

ineiti commented Jul 21, 2020

@Snuffleupagus - two questions from my TODO-list where I'd like to have your input:

  • currently tsc chokes on display/text_layer::TextLayerRenderTask and api::PDFDocumentLoadingTask as they are defined in a somewhat 'private' way (I never saw this construct before...) - you can remove the trailing _ to see how it fails. Do you have an easy idea how I can fix this?
  • is it OK if I add an index.js which exports everything needed? Then I'll reference that one in the package.json as the types-file. Or should I call it types.js?

@ineiti ineiti force-pushed the add_types_annotations branch from d936979 to d232312 Compare July 21, 2020 17:58
@Snuffleupagus
Copy link
Collaborator

two questions from my TODO-list where I'd like to have your input:

Unfortunately I don't really know anything about TypeScript, sorry!


My interest here is essentially limited to making sure that a solution, whatever it may end up looking like, is reasonable/maintainable from a general PDF.js-library perspective. And also, that we don't accidentally end up exposing unintended functionality through the public API (i.e. as defined in the src/pdf.js file).

@tamuratak
Copy link
Contributor

@ineiti This PR, ineiti/pull/1, makes the return type of getDocument to be PDFDocumentLoadingTask with tsc.

@oBusk
Copy link

oBusk commented Jul 21, 2020

I added ineiti#2 to compile the definitions using typescript via gulp-typescript rather than tsc. exec('tsc') is flimsy, didn't run on my machine

@ineiti
Copy link
Contributor Author

ineiti commented Jul 22, 2020

@ineiti This PR, ineiti/pull/1, makes the return type of getDocument to be PDFDocumentLoadingTask with tsc.

thanks - merged.

@ineiti
Copy link
Contributor Author

ineiti commented Jul 22, 2020

I added ineiti#2 to compile the definitions using typescript via gulp-typescript rather than tsc. exec('tsc') is flimsy, didn't run on my machine

thanks - merged...

@ineiti
Copy link
Contributor Author

ineiti commented Jul 22, 2020

@oBusk - I think I merged too fast. The PR you gave only creates one file, and I cannot use it in my project afterwards... So now I'm playing with gulp-typescript, trying to create a valid index.d.ts file... If you want to look at it, I pushed the latest commits including your PR.

I also removed the tsconfig.json, as per @Snuffleupagus request, and inlined it in the gulpfile.js.

So this latest commit doesn't work :(

@ineiti ineiti force-pushed the add_types_annotations branch 2 times, most recently from 41c0e33 to 3cca747 Compare July 22, 2020 19:54
@ineiti
Copy link
Contributor Author

ineiti commented Jul 22, 2020

OK, merged the commits, added the contributors, went again through all comments on the code.

The last task is how to test this thing - if I use typescript, I'll have to build it first before I can test it - is that a good idea?

@ineiti ineiti force-pushed the add_types_annotations branch from 3cca747 to 7af740d Compare July 22, 2020 19:57
@ineiti
Copy link
Contributor Author

ineiti commented Jul 30, 2020

@timvandermeij Notice that npx gulp jsdoc fails too.

Hmm - that's strange. I was able to re-write the callbacks in api.js like this:
https://github.com/ineiti/pdf.js/blob/add_types_annotations/src/display/api.js#L435

and then it passes gulp jsdoc.

However, there are other places like that happily accept callback definitions:
https://github.com/ineiti/pdf.js/blob/add_types_annotations/src/display/text_layer.js#L48

But, if I copy the line from text_layer.js to api.js, gulp jsdoc complains again...

@ineiti
Copy link
Contributor Author

ineiti commented Jul 30, 2020

@timvandermeij - I closed the trivial comments that have been fixed, but left the other comments open where I think you need to make sure that my proposition is correct. I hope that matches your normal workflow.

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/4f1cef8ce16b21a/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/4f1cef8ce16b21a/output.txt

Total script time: 3.40 mins

Published

@timvandermeij
Copy link
Contributor

/botio test

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/574214bb517ba83/output.txt

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_test from @timvandermeij received. Current queue size: 0

Live output at: http://54.215.176.217:8877/d5786cf652fe66b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Failed

Full output at http://54.67.70.0:8877/574214bb517ba83/output.txt

Total script time: 26.62 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.67.70.0:8877/574214bb517ba83/reftest-analyzer.html#web=eq.log

@pdfjsbot
Copy link

From: Bot.io (Windows)


Failed

Full output at http://54.215.176.217:8877/d5786cf652fe66b/output.txt

Total script time: 31.87 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: FAILED

Image differences available at: http://54.215.176.217:8877/d5786cf652fe66b/reftest-analyzer.html#web=eq.log

@ineiti
Copy link
Contributor Author

ineiti commented Jul 31, 2020

I see some regressions failed - but it looks like its in the rendering part, which I think we didn't touch. Is there anything we can do to fix this? Or is this expected?

@timvandermeij
Copy link
Contributor

No worries about those; those are known intermittent failures that have nothing to do with your patch and should be fixed once we upgrade Puppeteer in #12123.

@dqisme
Copy link

dqisme commented Aug 1, 2020

So excited pdfjs is on the way to TypeScript! 🍺
It's been a long time...

Copy link
Contributor

@timvandermeij timvandermeij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important acceptance criteria for TypeScript definitions were that the public API may not be changed (so no changes in src/pdf.js), that the solution is maintainable for PDF.js developers (by not having the types separated from the code but generated from the JSDoc comments) and that the solution is tested automatically (so that we know the types will work and get a failing test otherwise).

This pull request checks all those boxes. It improves the existing documentation quite a bit and gives an extra incentive to keep the documentation up-to-date because a test now fails if for example a new parameter is not added to the corresponding JSDoc comment.

All in all, I think this is nice work from everyone involved and we can now merge it! We are open for contributions from the community to improve the documentation even further, not only for better type annotations but also for the better API documentation.

Thank you for your work on this!

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Aug 2, 2020

A couple of questions:

  • Was the jsdoc command tested, to ensure that the current https://mozilla.github.io/pdf.js/api/draft/index.html page isn't breaking because of all these JSDoc changes.
  • I'm not sure that running this new test on Travis by default was such a great idea, since besides increasing the runtime quite a bit it also means that any TypeScript errors will now (likely) be seen as a blocker to landing patches.
    Basically, the PDF.js library is still a JavaScript project, and accepting this PR does not absolve TypeScript-users from helping to maintain the types. It does seem, in my opinion, to be somewhat unfair to now essentially require TypeScript-knowledge from every PDF.js contributor (which is what having these tests run by default amounts to).
  • It would have been good if the newly added typedefs followed the existing formatting a lot better, please see the expected format in

    pdf.js/src/display/api.js

    Lines 106 to 176 in 00a8b42

    /**
    * Document initialization / loading parameters object.
    *
    * @typedef {Object} DocumentInitParameters
    * @property {string} [url] - The URL of the PDF.
    * @property {TypedArray|Array<number>|string} [data] - Binary PDF data. Use
    * typed arrays (Uint8Array) to improve the memory usage. If PDF data is
    * BASE64-encoded, use atob() to convert it to a binary string first.
    * @property {Object} [httpHeaders] - Basic authentication headers.
    * @property {boolean} [withCredentials] - Indicates whether or not
    * cross-site Access-Control requests should be made using credentials such
    * as cookies or authorization headers. The default is false.
    * @property {string} [password] - For decrypting password-protected PDFs.
    * @property {TypedArray} [initialData] - A typed array with the first portion
    * or all of the pdf data. Used by the extension since some data is already
    * loaded before the switch to range requests.
    * @property {number} [length] - The PDF file length. It's used for
    * progress reports and range requests operations.
    * @property {PDFDataRangeTransport} [range]
    * @property {number} [rangeChunkSize] - Specify maximum number of bytes
    * fetched per range request. The default value is 2^16 = 65536.
    * @property {PDFWorker} [worker] - The worker that will be used for
    * the loading and parsing of the PDF data.
    * @property {number} [verbosity] - Controls the logging level; the
    * constants from {VerbosityLevel} should be used.
    * @property {string} [docBaseUrl] - The base URL of the document,
    * used when attempting to recover valid absolute URLs for annotations, and
    * outline items, that (incorrectly) only specify relative URLs.
    * @property {string} [cMapUrl] - The URL where the predefined
    * Adobe CMaps are located. Include trailing slash.
    * @property {boolean} [cMapPacked] - Specifies if the Adobe CMaps are
    * binary packed.
    * @property {Object} [CMapReaderFactory] - The factory that will be
    * used when reading built-in CMap files. Providing a custom factory is useful
    * for environments without `XMLHttpRequest` support, such as e.g. Node.js.
    * The default value is {DOMCMapReaderFactory}.
    * @property {boolean} [stopAtErrors] - Reject certain promises, e.g.
    * `getOperatorList`, `getTextContent`, and `RenderTask`, when the associated
    * PDF data cannot be successfully parsed, instead of attempting to recover
    * whatever possible of the data. The default value is `false`.
    * @property {number} [maxImageSize] - The maximum allowed image size
    * in total pixels, i.e. width * height. Images above this value will not be
    * rendered. Use -1 for no limit, which is also the default value.
    * @property {boolean} [isEvalSupported] - Determines if we can eval
    * strings as JS. Primarily used to improve performance of font rendering,
    * and when parsing PDF functions. The default value is `true`.
    * @property {boolean} [disableFontFace] - By default fonts are
    * converted to OpenType fonts and loaded via font face rules. If disabled,
    * fonts will be rendered using a built-in font renderer that constructs the
    * glyphs with primitive path commands. The default value is `false`.
    * @property {boolean} [fontExtraProperties] - Include additional properties,
    * which are unused during rendering of PDF documents, when exporting the
    * parsed font data from the worker-thread. This may be useful for debugging
    * purposes (and backwards compatibility), but note that it will lead to
    * increased memory usage. The default value is `false`.
    * @property {boolean} [disableRange] - Disable range request loading
    * of PDF files. When enabled, and if the server supports partial content
    * requests, then the PDF will be fetched in chunks.
    * The default value is `false`.
    * @property {boolean} [disableStream] - Disable streaming of PDF file
    * data. By default PDF.js attempts to load PDFs in chunks.
    * The default value is `false`.
    * @property {boolean} [disableAutoFetch] - Disable pre-fetching of PDF
    * file data. When range requests are enabled PDF.js will automatically keep
    * fetching more data even if it isn't needed to display the current page.
    * The default value is `false`.
    * NOTE: It is also necessary to disable streaming, see above,
    * in order for disabling of pre-fetching to work correctly.
    * @property {boolean} [pdfBug] - Enables special hooks for debugging
    * PDF.js (see `web/debugger.js`). The default value is `false`.
    */

I'd really like to get the above points addressed rather quickly, one way or another, since especially the second one may quickly become an issue. (Or this patch backed-out, pending the resolution of the above.)

@timvandermeij
Copy link
Contributor

timvandermeij commented Aug 2, 2020

Was the jsdoc command tested

Yes, I looked at the bot output and also ran it locally after rebasing onto the current master before merging, also because this was commented as not working in an earlier version of the patch. I also ran all test commands locally after rebasing to make sure it works locally as well as after the most recently merged patches.

I'm not sure that running this new test on Travis by default was such a great idea, since besides increasing the runtime quite a bit it also means that any TypeScript errors will now (likely) be seen as a blocker to landing patches.

The runtime impact from generating the TypeScript definitions is minimal in my testing (at most few seconds). Moreover, it only seems to fail if there is a mismatch between the actual code and the JSDoc comments. This showed up in an earlier version of the patch where the annotationStorage parameter was missing, and playing locally with this patch I found the same. It's basically a safeguard that forces us to keep the JSDoc comments up-to-date, which I only see as a good thing. No TypeScript specific knowledge should be required for this. If it ever turns out to be the case, we can always disable the tests if need be, but since we're distributing the types I'd really like to have some way of knowing that what we distribute is correct.

It would have been good if the newly added typedefs followed the existing formatting a lot better

Agreed, but I must note that consistency was already an issue in that file even before this patch to begin with, and can be dealt with in a follow-up. Even before I had already seen JSDoc comments where the descriptions are separated with a dash versus without, are with/without two spaces on the next line, are aligned on the next line, et cetera.

@Snuffleupagus
Copy link
Collaborator

No TypeScript specific knowledge should be required for this.

Based on some of the changes seen in this patch, I do have some doubts about that; hence why this worries me!

As long as we're still going to, on principle, accept patches that fail the typestest (and opt for disabling the test instead), I suppose that somewhat alleviates my primary concern here.

Comment on lines +725 to +726
* @returns {Promise<Object>} A promise that is resolved with an {Object}
* containing the viewer preferences.
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removed part of the existing comment, why?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this in the review and will put that back as part of the upcoming patch. Thank you for noticing!

Comment on lines +699 to +702
* @returns {Promise<Array<string> | null>} A promise that is
* resolved with an {Array} containing the page labels that correspond to
* the page indexes, or `null` when no page labels are present in the PDF
* file.
Copy link
Collaborator

@Snuffleupagus Snuffleupagus Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first line is now weirdly short, compared to the rest of them, it looks like this was incorrectly re-formatted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. I'll address that in the upcoming patch too.

@timvandermeij
Copy link
Contributor

Definitely. My reasoning is that everything that we distribute should be tested, and preferably automatically and in the earliest stage possible so we quickly find out about potential problems. If this PR did not have such a test I would rather not accept it especially because this is not a TypeScript project, but a (vanilla) JavaScript one. That would mean that we would distribute TypeScript definitions without having a clue if they actually work, which is not great.

Having a test at least tells us that a patch is going to break the TypeScript definitions. allowing us to consider that and make a decision on what to do with that. Most if not all the time it's likely to turn out to be a missing JSDoc comment update, which is not only useful for the TypeScript definitions but more importantly also for the consistency of our own documentation (since we've seen that this tends to be forgotten sometimes). I therefore see typestest as one that kills two birds with one stone: it ensures internal documentation consistency to a certain degree and it ensures that externally exposed type definitions are validated. Not running a test like that would hide such problems.

I can't really imagine any failures from typestest to be difficult to solve. Of course, if I turn out to be wrong here and we get errors that we cannot work with, we can definitely disable the test and make a follow-up issue to get it working again, if necessary by asking the various TypeScript contributors for assistance.

Hopefully this helps a bit with the concerns here!

@timvandermeij
Copy link
Contributor

timvandermeij commented Aug 2, 2020

Moreover, I'll make some time today to address the comments above and work on overall comment consistency within the src/display/api.js file since those are valid points and I'd like to take that concern away if possible.

Comment on lines +1362 to +1379
function () {
var packageJsonSrc = packageBowerJson()[0];
var TYPESTEST_DIR = BUILD_DIR + "typestest/";

return merge([
packageJsonSrc.pipe(gulp.dest(TYPESTEST_DIR)),
gulp
.src([
GENERIC_DIR + "build/pdf.js",
GENERIC_DIR + "build/pdf.worker.js",
SRC_DIR + "pdf.worker.entry.js",
])
.pipe(gulp.dest(TYPESTEST_DIR + "build/")),
gulp
.src(TYPES_BUILD_DIR + "**/**")
.pipe(gulp.dest(TYPESTEST_DIR + "build/")),
]);
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, defining this inline isn't really helping readability of the overall task, and it really ought to be moved into a helper function/task (e.g. typestest-pre) similar to what's done in other parts of the gulpfile.

},
function (done) {
exec(`node_modules/.bin/tsc -p test/types`, function (err, stdout) {
if (err !== null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably just be if (err) {, right?

@Maxchii
Copy link

Maxchii commented Aug 4, 2020

@timvandermeij Any ETA on when these changes will be reflected in the pdfjs-dist repository?

@timvandermeij
Copy link
Contributor

No, there is no release date planned yet.

@SleeplessByte
Copy link

If someone can build it, and post the types here, we can manually list the file in our projects until it's released.

@timvandermeij
Copy link
Contributor

The types are available in the current pre-release (beta) version of PDF.js; see https://github.com/mozilla/pdf.js/releases and https://www.npmjs.com/package/pdfjs-dist/v/next.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants