Better search results #560

visr · 2017-09-05T07:28:03Z

These are a few commits that should result in better search results for Julia code.
Lunr.js has changed quite a bit in the meantime, allowing for better performance and customization.

visr · 2017-09-05T07:48:02Z

I think this also fixes #520, though I didn't check.

visr · 2017-09-11T09:20:49Z

Modified the PR title to make clear this is good to go, not WIP :)

mortenpi · 2017-09-23T13:13:30Z

This looks great, thanks!

I would also have a few search-related feature requests. If they could be added into this PR, that would be awesome (not exactly sure how much work they'd be), or we could just merge this as is and track them in a separate issue.

Partial matches don't always show up, e.g. searching for "autodocs" does not find @autodocs related stuff in Documenter's manual.
Should the language-reduced search words not find functions names etc? I.e. right now, "makes" finds everything that has "make" in it, e.g. Generator.make.. perhaps it shouldn't (i.e. only match "make" in docstrings).
I think the page of search results could be significantly improved.
- Snippets showing what was matched would be great.
- Somehow prioritizing the results (e.g. exact function name matched would be preferred over docstring matches)

mortenpi · 2017-09-23T16:04:55Z

Regarding #520: searching for "get" now definitely finds stuff, but not the Base.get function. "get!" finds Base.get! without problems.

visr · 2017-09-24T16:26:43Z

Thanks for having a look.

Partial matches don't always show up, e.g. searching for "autodocs" does not find @autodocs related stuff in Documenter's manual.

Probably the best fix for this would be to add fuzzy matching by default, perhaps just with an edit distance of 1, so as not to hurt performance too much. You can already try searching autodocs~1. Additionaly, we can try enabling automatic trailing wildcards, such that you already get search results as you type depl without having to fully type deploydocs.

Should the language-reduced search words not find functions names etc? I.e. right now, "makes" finds everything that has "make" in it, e.g. Generator.make.. perhaps it shouldn't (i.e. only match "make" in docstrings).

Sounds like an improvement, but I don't know how it could be implemented. If we enable the wildcards stemming won't be performed anymore. Though makes* still matches Generator.make through fuzzy matching.

Snippets showing what was matched would be great.

There is support for highlighting (see the demo), but as you know we currently don't show any of the body in the search page. Seems like more work, perhaps better for a separate issue. Not 100% convinced it's worth it though, it's also nice to see the search results directly beneath each other as it is now. Plus we would have to check that it won't hurt performance too much.

Somehow prioritizing the results (e.g. exact function name matched would be preferred over docstring matches)

Do you have examples where this is not the case? See also the last paragraph of the commit message in aa1bb53. Shorter matches get higher scores, and search results are sorted by score.

So if you want I can add the fuzzy matching and trailing wildcard, I think that's not too much work.

visr · 2017-09-25T01:37:57Z

Ha, this is the reason get doesn't find Base.get; the stop word filter. We could disable it altogether, or just remove the ones that are function names in Base.

For lunr.js changes see https://github.com/olivernn/lunr.js/blob/master/CHANGELOG.mdown For upgrade guide see https://lunrjs.com/guides/upgrading.html The index is now immutable, hence the documents have to be added before the end of the configuration function. Build time field boosting is no longer supported, and replaced by query time boosting. In this commit it is excluded. Before adding it again, evaluate if it is still required, because with default settings it should score the short titles higher anyways.

Spaces are encoded as + in the URI and have to be treated separately first. fixes JuliaDocs#471

only put in what we use

fixes JuliaDocs#494

To be able to match fully qualified title fields more easily.

Query provides a programmatic way of defining (customisation of) queries.

With an edit distance of 2, such that `htmlwrita` matches `htmlwriter`.

A match on `title` gets a boost of 10, just like the original implementation. Furthermore the pipeline is not used for `title`. Trailing wildcards did not seem to work well with the other options, so they are off.

Stops filtering out Base Julia names and syntax that also happen to be common stopwords. fixes JuliaDocs#520

visr · 2017-09-25T15:38:29Z

Ok @mortenpi could you have a look at the last 4 commits? They should improve the results a bit more. Essentially I'm now searching each word twice, once in the title and once in the text. This way I can boost the title matches.

I didn't build the Base docs, and no time now, could you perhaps try to roughly compare the performance?

mortenpi · 2017-09-25T16:59:19Z

So my primitive benchmarking system (i.e. counting seconds) showed that it's roughly twice as slow. But the results are definitely better I think!

Here's a build of Base docs so that you could check it out and compare the performance as well: http://mortenpi.eu/julia/search-v1/search/?q=get

My feeling is that working search is more important than performant search, so even if this is a bit slow, we can live with it (and optimize later) -- the fixes are worth it. But perhaps @ararslan or someone could give their opinion as well whether this performance is acceptable?

Here's an example of a case where I'd expect "Documenter.Travis.genkeys(Function)" to be higher up in the list (I am assuming that like 90% of the searches are for specific functions, so those matches should probably be preferred):

But again, this can be a future improvement.

visr · 2017-09-26T02:44:56Z

Thanks. Building the index takes very long. We can prebuild and serialize it to improve on this in another PR (mentioned in #212 as well).
I have the habit of hitting enter after completing my query, but I see that this rebuilds the index, taking much longer. Any way of avoiding this?

mortenpi · 2017-09-26T07:20:21Z

improve on this in another PR

Yup.

I have the habit of hitting enter after completing my query, but I see that this rebuilds the index, taking much longer. Any way of avoiding this?

I think we could disable the submit event on the search page. Something like this should work:

$("form.search").submit(function() {
    return false;
})

Might be best to give the form an id though (in HTMLWriter.jl) and use that to refer to it.

This causes a reload, which rebuilds the index, which can take long. Downside is that the URL does not reflect the query, `/search/?q=get`, although pasting links with this syntax will still work.

visr · 2017-09-26T09:14:31Z

Thanks for the suggestion. I added another commit. Downside is that the URL is no longer updated on enter, but I don't see an easy way around that.

mortenpi · 2017-09-26T09:45:52Z

In principle we could put the query into the fragment (after #) and then it should be possible to update that in javascript. Actually, since search is javascript-based anyway, having the query in the fragment seems like the more correct solution. But this could be in a separate PR.

olivernn · 2017-09-28T18:14:10Z

I hope you don't mind some drive by comments...

The search looks good and the results seem promising, though I'm not familiar enough with Julia to really judge. I think there are a couple of things that might improve the performance though...

I think the quickest win would be to debounce the keyup event, it would mean less queries made and hopefully reduce the contention on the UI thread between the UI and search itself. For the best results you could experiment with doing the search in a web worker, freeing up the UI thread. The implementation is likely to be a little more involved though.

Finally, you might perhaps tweak the query you are doing. Using edit distance is the most expensive way to query, and a value of 2, especially for short queries, might be too expensive. One approach I've suggested to other implementors (with good results) is to combine exact search, with trailing wildcards and optionally fuzzy search using edit distance when doing typeahead style searches. Something like this:

idx.query(function (q) {
  q.term(term, { boost: 100 })
  q.term(term, { boost: 10, usePipeline: false, wildcard: lunr.Query.wilcard.TRAILING })
  q.term(term, { boost: 1, usePipeline: false, editDistance: 1 })
})

In your implementation I see you are boosting the two fields differently, you might experiment with only doing the expensive fuzzy search in the title. As I'm sure you've discovered, getting the right balance between decent results and performance is down to a bit of experimentation.

Anyway, just a few suggestions. I'm excited to see Julia using Lunr, and I'm more than willing to offer any help or advice where I can.

visr · 2017-09-29T02:05:27Z

Just when I was wondering if it was ok to ping the library author for advice 😃.
Much appreciated. I'll try out some of your suggestions this weekend.

Quoting olivernn: > I think the quickest win would be to debounce the keyup event, it would mean less queries made and hopefully reduce the contention on the UI thread between the UI and search itself.

mortenpi · 2017-10-06T15:56:47Z

This is good to go I guess?

mortenpi · 2017-10-07T11:29:36Z

We can always make additional improvements later. Thanks again @visr, @olivernn!

ararslan · 2017-10-07T19:33:24Z

But perhaps @ararslan or someone could give their opinion as well whether this performance is acceptable?

Sorry, just noticed this now. Yeah the performance leaves a bit to be desired but it's really not all that much worse than it currently is, so ¯\_(ツ)_/¯

visr · 2017-10-08T20:15:12Z

I was a bit streched on time to try out the other suggestions by @olivernn. But I figured to improve performance whilst still getting good search results, testing and benchmarking is the way to go.

Documenter-jl-search-testing is a start to test this in Node, on the Base docs, with npm test and npm run perf.

ihnorton · 2017-12-27T17:21:33Z

Related discussion of performance (with links to some server-side index experiments): mkdocs/mkdocs#859

visr changed the title ~~Work on search~~ Better search results Sep 11, 2017

mortenpi added the Type: Enhancement label Sep 23, 2017

mortenpi added this to the 0.12 milestone Sep 23, 2017

mortenpi approved these changes Sep 23, 2017

View reviewed changes

visr added 6 commits September 25, 2017 14:25

search.js: use same jquery as in documenter.js

8d6ac8d

decode URI to search query before filling search box with it

b35c49a

Spaces are encoded as + in the URI and have to be treated separately first. fixes JuliaDocs#471

reduce the size of the store

ebcc55e

only put in what we use

use custom trimmer to preserve @ and !

4bff0eb

fixes JuliaDocs#494

add . as a search token separator

9dda17d

To be able to match fully qualified title fields more easily.

visr force-pushed the search branch from e42a9ce to ed4ae55 Compare September 25, 2017 15:23

visr added 4 commits September 25, 2017 22:25

use query instead of search for defining queries

8fda73c

Query provides a programmatic way of defining (customisation of) queries.

enable fuzzy search by default

18a6d58

With an edit distance of 2, such that `htmlwrita` matches `htmlwriter`.

handle queries differently between title and text

2533ae5

A match on `title` gets a boost of 10, just like the original implementation. Furthermore the pipeline is not used for `title`. Trailing wildcards did not seem to work well with the other options, so they are off.

add custom stopWordFilter

fae43fb

Stops filtering out Base Julia names and syntax that also happen to be common stopwords. fixes JuliaDocs#520

visr force-pushed the search branch from ed4ae55 to fae43fb Compare September 25, 2017 15:26

mortenpi added a commit to mortenpi/julia that referenced this pull request Sep 25, 2017

Search from JuliaDocs/Documenter.jl#560

15707df

don't submit search form event

7d6f41b

This causes a reload, which rebuilds the index, which can take long. Downside is that the URL does not reflect the query, `/search/?q=get`, although pasting links with this syntax will still work.

mortenpi approved these changes Sep 26, 2017

View reviewed changes

debounce the keyup event

74354c7

Quoting olivernn: > I think the quickest win would be to debounce the keyup event, it would mean less queries made and hopefully reduce the contention on the UI thread between the UI and search itself.

Merge branch 'master' into search

f13386e

mortenpi merged commit 2e152e1 into JuliaDocs:master Oct 7, 2017

visr deleted the search branch October 8, 2017 20:04

This was referenced Oct 23, 2017

Can't search for a function with a bang in the HTML docs JuliaLang/julia#20363

Closed

Still unable to search for macros in the doc JuliaLang/julia#20828

Closed

mortenpi mentioned this pull request Jun 22, 2018

Reduce editDistance to reduce search results? #746

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better search results #560

Better search results #560

visr commented Sep 5, 2017

visr commented Sep 5, 2017

visr commented Sep 11, 2017

mortenpi commented Sep 23, 2017

mortenpi commented Sep 23, 2017

visr commented Sep 24, 2017

visr commented Sep 25, 2017 •

edited

Loading

visr commented Sep 25, 2017

mortenpi commented Sep 25, 2017

visr commented Sep 26, 2017

mortenpi commented Sep 26, 2017

visr commented Sep 26, 2017

mortenpi commented Sep 26, 2017

olivernn commented Sep 28, 2017

visr commented Sep 29, 2017

mortenpi commented Oct 6, 2017

mortenpi commented Oct 7, 2017

ararslan commented Oct 7, 2017

visr commented Oct 8, 2017

ihnorton commented Dec 27, 2017 •

edited

Loading

Better search results #560

Better search results #560

Conversation

visr commented Sep 5, 2017

visr commented Sep 5, 2017

visr commented Sep 11, 2017

mortenpi commented Sep 23, 2017

mortenpi commented Sep 23, 2017

visr commented Sep 24, 2017

visr commented Sep 25, 2017 • edited Loading

visr commented Sep 25, 2017

mortenpi commented Sep 25, 2017

visr commented Sep 26, 2017

mortenpi commented Sep 26, 2017

visr commented Sep 26, 2017

mortenpi commented Sep 26, 2017

olivernn commented Sep 28, 2017

visr commented Sep 29, 2017

mortenpi commented Oct 6, 2017

mortenpi commented Oct 7, 2017

ararslan commented Oct 7, 2017

visr commented Oct 8, 2017

ihnorton commented Dec 27, 2017 • edited Loading

visr commented Sep 25, 2017 •

edited

Loading

ihnorton commented Dec 27, 2017 •

edited

Loading