You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Different databases require different configuration parameters, of different types:
Mandatory parameters - e.g. username/password for authentication or api_key, hostname or index name to identify index
Optional / variable parameters - e.g. pgvector's HNSW index takes M and ef_construction to control index speed vs recall tradeoff.
We need to pass these options through when that database is selected, in a scalable manner (we don't really want to show the union of all options across all databases). We also want to set sensible default values for them - e.g. pgvector should default to settings to connect to the local Docker container.
This also has some impact in how results are collected and reported - if we want to perform benchmark runs with a range of values for ef_construction for example, how should those results be reported?
The text was updated successfully, but these errors were encountered:
Pydantic has a related package - https://github.com/pydantic/pydantic-settings - which handles much of this and integrates with Pydantic's models (which we already use in VSB). However it doesn't have support for overriding on the command line, which is a significant omission IMO.
There is an open Issue and linked PR to add command-line support - see pydantic/pydantic-settings#209 - but as of writing is not yet merged...
This has mostly been implemented via #82 - each database has it's own arguments in an option group, and when specifying a given database, the arguments required for it are checked.
Different databases require different configuration parameters, of different types:
M
andef_construction
to control index speed vs recall tradeoff.We need to pass these options through when that database is selected, in a scalable manner (we don't really want to show the union of all options across all databases). We also want to set sensible default values for them - e.g. pgvector should default to settings to connect to the local Docker container.
This also has some impact in how results are collected and reported - if we want to perform benchmark runs with a range of values for
ef_construction
for example, how should those results be reported?The text was updated successfully, but these errors were encountered: