Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search result return a lot of irrelevant content #1024

Closed
newswangerd opened this issue Aug 3, 2018 · 3 comments
Closed

Search result return a lot of irrelevant content #1024

newswangerd opened this issue Aug 3, 2018 · 3 comments

Comments

@newswangerd
Copy link
Member

newswangerd commented Aug 3, 2018

Try clicking on any of these links. The top 4 or 5 search results should be directly related to the query, but in most cases, the results are littered with a whole lot of irrelevant results.

The problem seems to be related to the fact that we rank results based on keywords in the README, and the number of keyword in the README aren't always representative of what a role actually does.

@chouseknecht
Copy link
Contributor

chouseknecht commented Aug 6, 2018

@newswangerd

Thanks for pointing this out. We definitely have more work to do here. We're ranking search results based on a combination of keyword matches and download count. Unfortunately we're not assigning priority to the matches. We should give higher priority to name and description matches vs. README matches.

Haven't gone through all of your examples, but noticed that the README for the first item returned in the last example the README is not displayed by the UI. Seems like a bug. Could you please take a peek at that.

@cutwater

For that particular example, the README in the first result item contains 2 occurrences of the keyword gunicorn used in example playbooks. Those 2 occurrences explain why it gets returned; however, the role doesn't actually have anything to do with gunicorn.

Since the term gunicorn is not found in the content name, nor in the content description, it would seem that we should either ignore the fact that it does occur in the README, and thus eliminate the item from the search results, or downgrade the item's search ranking.

Can we come up with a search ranking that gives a higher priority to keywords found in the title and description, and a lower priority to README matches?

Since we're infusing the search ranking with download count, the bad result may still float the top. It's hard to know, if we can actually make this better, but seems worth experimenting.

@chouseknecht
Copy link
Contributor

Upping priority. We need to discuss as a team, and potentially back port to 3.0.

@publicarray
Copy link

publicarray commented Sep 5, 2018

Same here: https://galaxy.ansible.com/search?vendor=false&keywords=unbound&order_by=-relevance&page_size=10

  • Some are duplicates. debops/unbound appears twice.
  • Irrelevant items (or simply ones that should be ranked lower)zimbra, postfix and maybe dnscrypt-proxy

@newswangerd newswangerd added the status/fix-committed Merged to develop \ release branch label Sep 17, 2018
@chouseknecht chouseknecht added the status/fix-released Fixed in the latest release label Sep 20, 2018
@cutwater cutwater removed the status/fix-committed Merged to develop \ release branch label Nov 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants