Can faiss batch sa_encode embeddings? #4193

Luciferre · 2025-02-18T02:21:58Z

Luciferre
Feb 18, 2025

Hi, I'm currently testing HNSW with scalar quantizer SQ8. Since my dataset is pretty large and can't processed by original number of reducers(and we don't want to increase the number of reducers because that would also affect the search performance), so I tested splitting embeddings to several batches and then encoded and trained them by batch. But the recall dropped a lot by using batch. Just want to check that is batch encoding a feasible way? Or should we encode the whole bunch of embeddings together but not splitting in batch?

Thank you.

Answered by satymish

Feb 18, 2025

Hi @Luciferre , I think splitting embeddings into batches and training them separately is probably creating its own local graph and leading to reduced recall. It's usually better to encode the entire dataset together without splitting it into batches but it can work if the split batches are themselves a good representation of the full dataset or are representing different clusters.

One solution could be sampling the data. For eg if the dataset is of 1B and we can train only on 10M, then selecting 10M out of 1B rows using a good sampling technique that extracts rows but still closely representing full dataset.

View full answer

satymish · 2025-02-18T05:44:22Z

satymish
Feb 18, 2025
Collaborator

Hi @Luciferre , I think splitting embeddings into batches and training them separately is probably creating its own local graph and leading to reduced recall. It's usually better to encode the entire dataset together without splitting it into batches but it can work if the split batches are themselves a good representation of the full dataset or are representing different clusters.

One solution could be sampling the data. For eg if the dataset is of 1B and we can train only on 10M, then selecting 10M out of 1B rows using a good sampling technique that extracts rows but still closely representing full dataset.

1 reply

Luciferre Feb 18, 2025
Author

thank you very much

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can faiss batch sa_encode embeddings? #4193

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Can faiss batch sa_encode embeddings? #4193

Luciferre Feb 18, 2025

Replies: 1 comment · 1 reply

satymish Feb 18, 2025 Collaborator

Luciferre Feb 18, 2025 Author

Luciferre
Feb 18, 2025

Replies: 1 comment 1 reply

satymish
Feb 18, 2025
Collaborator

Luciferre Feb 18, 2025
Author