Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small clarification for jaccard and lift similarity measures #1668

Merged
merged 2 commits into from
Mar 10, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion recommenders/utils/python_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ def exponential_decay(value, max_val, half_life):

def jaccard(cooccurrence):
"""Helper method to calculate the Jaccard similarity of a matrix of co-occurrences.
When comparing Jaccard with count co-occurrence and lift similarity, count favours
predictability, meaning that the most popular items will be recommended most of
the time. Lift, by contrast, favours discoverability/serendipity, meaning that an
item that is less popular overall but highly favoured by a small subset of users
is more likely to be recommended. Jaccard is a compromise between the two.

Args:
cooccurrence (numpy.ndarray): the symmetric matrix of co-occurrences of items.
Expand All @@ -46,7 +51,10 @@ def jaccard(cooccurrence):


def lift(cooccurrence):
"""Helper method to calculate the Lift of a matrix of co-occurrences.
"""Helper method to calculate the Lift of a matrix of co-occurrences. In comparison
with basic co-occurrence and Jaccard similarity, lift favours discoverability and
serendipity, as opposed to co-occurrence that favours the most popular items, and
Jaccard that is a compromise between the two.

Args:
cooccurrence (numpy.ndarray): The symmetric matrix of co-occurrences of items.
Expand Down