Skip to content

Commit

Permalink
Implement __contains__ for vocab.
Browse files Browse the repository at this point in the history
Python performs linear search over sequences if __contains__ is
not explicitly implemented. This is painfully slow over large
vocabularies, therefore implement __contains__ explicitly.
  • Loading branch information
sebpuetz committed Oct 3, 2019
1 parent 178ca4f commit bef350d
Showing 1 changed file with 12 additions and 0 deletions.
12 changes: 12 additions & 0 deletions src/vocab.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,4 +75,16 @@ impl PySequenceProtocol for PyVocab {
Ok(words[idx as usize].clone())
}
}

fn __contains__(&self, word: String) -> PyResult<bool> {
let embeds = self.embeddings.borrow();
Ok(embeds
.vocab()
.idx(&word)
.map(|word_idx| match word_idx {
WordIndex::Word(_) => true,
WordIndex::Subword(_) => false,
})
.unwrap_or(false))
}
}

0 comments on commit bef350d

Please sign in to comment.