-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Normalize storage in Embeddings constructor. #94
Comments
Probably a bad idea, we'd have to treat quantized and array storages differently. What do you think @danieldk ? |
Not only quantized storages, but also mmap'ed storages. finalfusion embeddings should already be l2-normalized according to the spec. We have some historical embeddings with are normalized, but do not have a norms chunk. And it would be senseless to renormalize them again and then store all-1 norms. It's should be part of the interface that callers only provide l2-normalized embeddings. Ideally, we would check this as an invariant, but the invariant is too expensive (and also goes against mmap'ing embeddings). I guess we could consider validate the initial n embeddings as a sanity check. |
For mmap'ed storages it'd at least raise an exception, for quantized storages it just wouldn't do anything to the storage but still store the norms. It'd be reconstructed by virtue of
Might add something to the docs, but I don't think going further makes much sense. The storage array is writable and so is the norms array, we can't really enforce normalization unless the arrays are made immutable (I think there's also a flag for in-memory arrays to prevent modification). |
If no norms are passed to
Embeddings
, normalize embeddings and add norms to the embeddings.The text was updated successfully, but these errors were encountered: