Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing synctex edit by using an array instead of a linked list for tag lookup #69

Open
user202729 opened this issue Feb 9, 2024 · 3 comments · May be fixed by #90
Open

Optimizing synctex edit by using an array instead of a linked list for tag lookup #69

user202729 opened this issue Feb 9, 2024 · 3 comments · May be fixed by #90

Comments

@user202729
Copy link

This is a profiling result on a relatively large PDF file (1000 pages, with around 500 tags):

image

A synctex edit command takes 1.5 seconds on the file.

According to the benchmark result, most of the time is spent on searching for the input node corresponding to a tag, which is done by a linked list traversal.

https://github.com/jlaurens/synctex/blob/2020/synctex_parser.c#L6334-L6342

Because the output tags are numbered sequentially by the engines, this can be changed to use an array instead.

By a rough estimation, this improvement would speed up the relevant part by an order of at least 50, which results in approximately 25-30% overall reduction in runtime.

@jlaurens
Copy link
Owner

jlaurens commented Feb 9, 2024

I agree. The question is how to implement that in POC? Initially no smart C library was available in TeXLive, but nowadays we have at least lua.

@user202729
Copy link
Author

I don't think it would be too difficult to implement it (but it would to require quite a bit of work), basically all we need is a resizable array, which can be implemented with just malloc.

@user202729
Copy link
Author

user202729 commented Feb 25, 2024

I implemented a proof of concept: user202729/luatex@0831bb6

Caveat: I don't really understand why the source code and ownership system need to be that complicated (there's a signaling system to free the nodes?), so instead of removing the linked list entirely, I just put the array in addition to the linked list.

In theory, it should be possible to remove the double-indirection entirely and make a contiguous array of synctex_node_s, which should improve memory locality and performance.

For me, this indeed shows a ≈ 33% improvement in performance. (from 1.5s to 1s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants