-
Notifications
You must be signed in to change notification settings - Fork 8
Phenotypic similarity
Orion Buske edited this page Feb 21, 2016
·
2 revisions
Currently, the elasticsearch score is used directly as the phenotypic similarity score, normalized to the range [0, 1]. By indexing cases and their phenotypes (and the phenotype ancestors), we can directly query the elasticsearch index to quickly fetch the most similar cases. This should scale efficiently to a very large number of cases.
From datastore.py
:
result = self._db.search(index=self._index, body=query)
scored_patients = []
for hit in result['hits']['hits'][:n]:
# Just use the ElasticSearch TF/IDF score, normalized to [0, 1]
score = 1 - 1 / (1 + hit['_score'])
scored_patients.append((score, Patient(hit['_source']['doc'])))
A simpler approach would just iterate over all cases in the database, and compute a phenotypic similarity score (e.g. the UI score) directly.
Here is some pseudocode for how this might work:
query_patient = Patient(...)
# get the set of all the patient's phenotypes and their ancestors (the induced HPO subgraph)
query_phenotypes = query_patient._get_implied_present_phenotypes()
scored_patients = []
for match_patient in database:
match_phenotypes = match_patient._get_implied_present_phenotypes()
# the UI score
score = len(query_phenotypes.intersection(match_phenotypes)) / len(query_phenotypes.union(match_phenotypes))
scored_patients.append((score, match_patient))