-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexFlatL2 performance degradation #1762
Comments
Thanks for the extensive analysis. |
Forgot to mention that I commented line with |
I have to figure out why the results of your benchmark differ from mine. |
I made additional measurements on bigger range of data shapes and found out some interesting things:
Used SW and HW are same as above. Full benchmark results (csv format): Code of benchmarks: import faiss
import numpy as np
import pandas as pd
from scipy import stats
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from timeit import default_timer as timer
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)
NLIST = 100
NPROBE = 10
N_CLASSES = 5
N_CLUSTERS_PER_CLASS = 4
N_RUNS = 10
def box_filter(timing, left=0.25, right=0.75):
timing.sort()
size = len(timing)
if size == 1:
return timing[0]
q1, q2 = timing[int(size * left)], timing[int(size * right)]
iq = q2 - q1
lower, upper = q1 - 1.5 * iq, q2 + 1.5 * iq
result = np.array([item for item in timing if lower < item < upper])
return np.mean(result)
def index_flatl2_search(train_data, test_data, k):
d = train_data.shape[1]
index = faiss.IndexFlatL2(d)
index.add(train_data)
return index.search(test_data, k)
def index_ivfflat_search(train_data, test_data, k):
d = train_data.shape[1]
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, NLIST)
index.nprobe = NPROBE
index.train(train_data)
index.add(train_data)
return index.search(test_data, k)
def compute_accuracy_from_indices(indices, train_y, test_y):
pred_y, _ = stats.mode(train_y[indices], axis=1)
pred_y = pred_y.ravel()
return accuracy_score(test_y, pred_y)
rows_arr = [5000, 10000, 20000, 50000, 100000]
dims_arr = [8, 32, 128, 512]
k_arr = [8, 32, 128]
n_cases = len(rows_arr) ** 2 * len(dims_arr) * len(k_arr)
bench_data = pd.DataFrame({
"n_train": np.zeros(shape=(n_cases,), dtype=np.int32),
"n_test": np.zeros(shape=(n_cases,), dtype=np.int32),
"dims": np.zeros(shape=(n_cases,), dtype=np.int32),
"k": np.zeros(shape=(n_cases,), dtype=np.int32),
"IndexIVFFlat_time[s]": np.zeros(shape=(n_cases,), dtype=np.float32),
"IndexFlatL2_time[s]": np.zeros(shape=(n_cases,), dtype=np.float32),
"IndexIVFFlat_accuracy": np.zeros(shape=(n_cases,), dtype=np.float32),
"IndexFlatL2_accuracy": np.zeros(shape=(n_cases,), dtype=np.float32),
"IndexIVFFlat_speedup": np.zeros(shape=(n_cases,), dtype=np.float32),
"IndexIVFFlat_rel_accuracy": np.zeros(shape=(n_cases,), dtype=np.float32)
})
j = 0
for n_train_rows in rows_arr:
for n_test_rows in rows_arr:
for n_dims in dims_arr:
for k in k_arr:
x, y = make_classification(n_samples=n_train_rows + n_test_rows, n_features=n_dims,
n_informative=n_dims, n_repeated=0, n_redundant=0,
n_classes=N_CLASSES, n_clusters_per_class=N_CLUSTERS_PER_CLASS, random_state=42)
x = x.astype(np.float32)
train_data, test_data = x[:n_train_rows], x[n_train_rows:]
train_y, test_y = y[:n_train_rows], y[n_train_rows:]
ivff_times = []
f_times = []
for i in range(N_RUNS):
t0 = timer()
_, ivff_idx = index_ivfflat_search(train_data, test_data, k)
t1 = timer()
_, f_idx = index_flatl2_search(train_data, test_data, k)
t2 = timer()
ivff_times.append(t1 - t0)
f_times.append(t2 - t1)
ivff_acc = compute_accuracy_from_indices(ivff_idx, train_y, test_y)
f_acc = compute_accuracy_from_indices(f_idx, train_y, test_y)
ivff_time = box_filter(ivff_times)
f_time = box_filter(f_times)
speedup = f_time / ivff_time
rel_acc = ivff_acc / f_acc
bench_data.at[j, 'n_train'], bench_data.at[j, 'n_test'] = n_train_rows, n_test_rows
bench_data.at[j, 'dims'], bench_data.at[j, 'k'] = n_dims, k
bench_data.at[j, 'IndexIVFFlat_time[s]'], bench_data.at[j, 'IndexFlatL2_time[s]'] = ivff_time, f_time
bench_data.at[j, 'IndexIVFFlat_accuracy'], bench_data.at[j, 'IndexFlatL2_accuracy'] = ivff_acc, f_acc
bench_data.at[j, 'IndexIVFFlat_speedup'], bench_data.at[j, 'IndexIVFFlat_rel_accuracy'] = speedup, rel_acc
j += 1
print(bench_data) |
I have performance degradation too after update from v1.6.3 to v1.7.0 |
I found this function it was parallel before faiss/faiss/utils/distances.cpp Lines 310 to 320 in ef28350
it is now faiss/faiss/utils/distances.cpp Lines 280 to 295 in e1adde0
|
Right, thanks for narrowing down the regression. |
Summary: This diff is related to facebookresearch#1762 The ResultHandler introduced for FlatL2 and FlatIP was not multithreaded. This diff attempts to fix that. To be verified if it is indeed faster. Differential Revision: D27939173 fbshipit-source-id: 61e09355df65c899a89f70d2bd5ab47076871289
OK the indexing is a bit tricky so I prepared a PR here: #1840 Would you mind running the benchmark to check if we recover the previous performance? |
@ava57r sorry, I don't understand. Do you suggest that -msse4 is faster than -mavx2 ? |
No. |
Summary: Pull Request resolved: #1840 This diff is related to #1762 The ResultHandler introduced for FlatL2 and FlatIP was not multithreaded. This diff attempts to fix that. To be verified if it is indeed faster. Reviewed By: wickedfoo Differential Revision: D27939173 fbshipit-source-id: c85f01a97d4249fe0c6bfb04396b68a7a9fe643d
I tried to run this bench q=100
q=10000
commit (698a459)
commit (e1adde0)
master(b4c320a)
-faiss.cvar.distance_compute_min_k_reservoir = 5
+#faiss.cvar.distance_compute_min_k_reservoir = 5 #!/bin/bash
export PYTHON=python
export PY_VER=38
export CPU_COUNT=32
export PREFIX=$CONDA_PREFIX
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib
cp -r benchs benchs_copy # first commit hasn't needed benchmark
rm -rf logs
mkdir -p logs
run_bench () {
mkdir -p logs/$1
rm -rf _*
pip uninstall -y faiss faiss-cpu
git checkout $1
bash conda/faiss/build-lib.sh
bash conda/faiss/build-pkg.sh
python benchs_copy/bench_index_flat.py > logs/$1/bench_index_flat.log
}
run_bench 698a4592e87920f036aa7a2d8a3a56e12387a8f0
run_bench e1adde0d84ece584f3b4d86db0b1532329f8cdb8
run_bench master |
Summary
IndexFlatL2 has performance degradation on large number of threads since e1adde0 commit.
This degradation can be observed with benchs/bench_index_flat.py.
bench_index_flat benchmark aggregated results:
q=100
56 threads
100 query size
q=10000
56 threads
10000 query size
bench_index_flat benchmark full logs:
before degradation (698a459):
after degradation (e1adde0):
master (6977d72):
Platform
CPU: Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz, 2 sockets, 28 cores per socket
OS: Linux, CentOS 7.6.1810, 5.4.69 kernel version
Faiss version before perf. degradation: 698a459
Faiss version after perf. degradation: e1adde0
Installed from: compiled with GCC 10.2.0
Faiss compilation options: default from conda build scripts (see reproducer)
SW: Python 3.8.8, MKL 2020.4, llvm-openmp 11.0.1
Running on:
Interface:
Reproduction instructions
Clone Faiss repository, modify benchs/bench_index_flat.py to use needed number of threads and run following bash script:
The text was updated successfully, but these errors were encountered: