You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Help! How can I load 400G vectors into faiss and do my dense vector search?
I want to input some query data and index data (index data is my database to query), and use faiss for search.
My python code is like this:
index_data = pickle.load(open(index_file, "rb")) # This is my search database, a file with 400G memory
query_data = pickle.load(open(query_file, "rb")) # This is my query vector
ids = []
indexs = []
for i, (idx, vecs) in enumerate(index_data.items()): # Here might out of memory
for vec in vecs:
ids.append(idx)
indexs.append(vec)
queries = []
idxq = []
for idx, vec in query_data.items():
queries.append(vec[0])
idxq.append(idx)
ids = np.array(ids)
indexs = np.array(indexs)
queries = np.array(queries)
# build faiss index
d = 768
k = 100
index = faiss.IndexFlatIP(d)
assert index.is_trained
index_id = faiss.IndexIDMap(index)
index_id.add_with_ids(indexs, ids) # Here might also out of memory, how to use faiss to solve this?
res = faiss.StandardGpuResources()
gpu_index = faiss.index_cpu_to_gpu(res, 0, index_id)
res = {}
D, I = gpu_index.search(queries, k)
for i, (sd, si) in enumerate(zip(D, I)):
res[str(idxq[i])] = {}
for pd, pi in zip(sd, si):
if str(pi) not in res[str(idxq[i])]:
res[str(idxq[i])][str(pi)] = pd
if len(res[str(idxq[i])]) > 100:
break
The question is, if I want build index and data in faiss, now I must read all data out at once and then use add_with_ids , but my data is too large, that will cause out of memory.
So how can I do this and avoid out of memory? Or might there has anyway to tackle my data like some small batch? Or can we make faiss search vector in my hard disk file rather than in my RAM?
Any buddy can help me? I would be grateful if you could give me some sample code and corresponding explanations!!!
This discussion was converted from issue #2662 on June 24, 2024 15:31.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Summary
Help! How can I load 400G vectors into faiss and do my dense vector search?
I want to input some query data and index data (index data is my database to query), and use faiss for search.
My python code is like this:
The question is, if I want build index and data in faiss, now I must read all data out at once and then use
add_with_ids
, but my data is too large, that will cause out of memory.So how can I do this and avoid out of memory? Or might there has anyway to tackle my data like some small batch? Or can we make faiss search vector in my hard disk file rather than in my RAM?
Any buddy can help me? I would be grateful if you could give me some sample code and corresponding explanations!!!
Beta Was this translation helpful? Give feedback.
All reactions