Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Use array type for embedding speed-up #2059

Closed
1 task done
pamelafox opened this issue Jan 28, 2025 · 6 comments
Closed
1 task done

Idea: Use array type for embedding speed-up #2059

pamelafox opened this issue Jan 28, 2025 · 6 comments

Comments

@pamelafox
Copy link
Contributor

Confirm this is a feature request for the Python library and not the underlying OpenAI API.

  • This is a feature request for the Python library

Describe the feature or improvement you're requesting

The SDK currently uses numpy to speed up embedding:

            embedding.embedding = np.frombuffer(  # type: ignore[no-untyped-call]
                base64.b64decode(data), dtype="float32"
            ).tolist()

It does seem to improve performance, based on our tests, but we were wondering if similar gains could be made without numpy, using the built-in array type? Have you tried that already?

https://docs.python.org/3/library/array.html

We're having some pains with the numpy dependency for our Azure samples and are looking for ways to move off it without affecting performance.

Additional context

No response

@RobertCraigie
Copy link
Collaborator

I have not tested with the array type but I'm also curious if it could provide similar improvements. Would you be able to share some examples of what the changes we'd need to make would look like?

@tonybaloney
Copy link
Contributor

tonybaloney commented Jan 28, 2025

Looking at what numpy is used for in the embeddings type, that is taking a base64 bytes object as a buffer (non-copy reference), then converting it into a compact float32 single-dimension array in numpy, then back out to a list of native floats, you can do that with the builtin array type:

embedding.embedding = array.array("f", base64.b64decode(data)).tolist()

Have submitted this in a draft PR

@tonybaloney
Copy link
Contributor

tonybaloney commented Jan 28, 2025

Benchmark:

import array
import base64
import numpy as np
import json

# Sample data
data = ''

as_json = json.dumps(array.array("f", base64.b64decode(data)).tolist())

def bench_standard():
    for _ in range(1000):
        json.loads(as_json)

def bench_array():
    for _ in range(1000):
        array.array("f", base64.b64decode(data)).tolist()

def bench_numpy():
    for _ in range(1000):
        np.frombuffer(  # type: ignore[no-untyped-call]
                        base64.b64decode(data), dtype="float32"
                    ).tolist()

__benchmarks__ = [
    (bench_standard, bench_array, "Standard vs array"),
    (bench_standard, bench_numpy, "Standard vs numpy"),
    (bench_numpy, bench_array, "Array vs numpy"),
]

Replace json with a more efficient encoder (orjson)

Results show this array approach is equivalent to the numpy one (10-20% faster) and is significantly faster than the standard approach (10x):

Benchmark Min Max Mean Min (+) Max (+) Mean (+)
Standard vs array 3.115 3.388 3.223 0.225 (13.8x) 0.256 (13.2x) 0.242 (13.3x)
Standard vs numpy 2.941 3.330 3.135 0.263 (11.2x) 0.306 (10.9x) 0.280 (11.2x)
Array vs numpy 0.256 0.273 0.264 0.218 (1.2x) 0.227 (1.2x) 0.222 (1.2x)

@tonybaloney
Copy link
Contributor

Sorry, I'm reading my own benchmark data wrong. It's faster than numpy

@tonybaloney
Copy link
Contributor

Benchmark comparing the pydantic parser which openai uses to the array and numpy approaches:

import array
import base64
import numpy as np
import json
import pydantic

# Sample data
data = ''

class Float32Array(pydantic.BaseModel):
    data: list[float]

as_json = json.dumps({"data": array.array("f", base64.b64decode(data)).tolist()})

def bench_standard():
    for _ in range(1000):
        Float32Array.model_validate_json(as_json)

def bench_array():
    for _ in range(1000):
        array.array("f", base64.b64decode(data)).tolist()

def bench_numpy():
    for _ in range(1000):
        np.frombuffer(  # type: ignore[no-untyped-call]
                        base64.b64decode(data), dtype="float32"
                    ).tolist()

__benchmarks__ = [
    (bench_standard, bench_array, "Standard vs array"),
    (bench_standard, bench_numpy, "Standard vs numpy"),
    (bench_numpy, bench_array, "numpy vs array"),
]
Benchmark Min Max Mean Min (+) Max (+) Mean (+)
Standard vs array 0.988 1.523 1.233 0.292 (3.4x) 0.313 (4.9x) 0.297 (4.1x)
Standard vs numpy 0.928 1.046 0.991 0.341 (2.7x) 0.370 (2.8x) 0.361 (2.7x)
numpy vs array 0.333 0.383 0.347 0.284 (1.2x) 0.316 (1.2x) 0.298 (1.2x)

If this approach was the new default, it is 4x than the current pydantic parser and 20% faster than the numpy decoder

@RobertCraigie
Copy link
Collaborator

Closing this as the PR was merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants