-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any Documentation on Sizing Suggestions #4569
Comments
Hi @alex-gadea |
Yes in https://docs.yugabyte.com/latest/deploy/checklist/. For kubernetes we also have best practices in https://docs.yugabyte.com/latest/deploy/kubernetes/best-practices/. In kubernetes it's also best to provide fixed hardware requirements. I'm also doing a pull request regarding general best practices in #4158 (will finish soon).
Master overhead should not be high memory, except during actually creating the database(s). Afterwards, they’re mostly a bunch of UUIDs in some maps.
Each new YSQL connection starts a new process just like PostgreSQL. Therefore the overhead will be roughly the same. We have a default limit of 300 max connections per yb-tserver https://docs.yugabyte.com/latest/reference/configuration/yb-tserver/#ysql-max-connections . New connections also add overheard from fetching metadata from the master.
Usually CPU becomes a bottleneck before memory. See https://docs.yugabyte.com/latest/deploy/checklist/.
It's best to have at least the recommended node size mentioned https://docs.yugabyte.com/latest/deploy/checklist/. A bigger machine can support a bigger colocated database before it may need to be distributed. We're also working to pull out of big tables out of colocation and distribute them across the cluster.
Usually you also want to expand horizontally because very big machines may have more impact on the cluster when they go down.
Yes on your data size.
Yes. But we need to keep in mind the disk/network/cpu/ram requirements, hot data distribution (zipfian,random) etc.
This will depend on the number of rows that the query will group. We haven't optimized a lot aggregate queries but based on your mentioned data size it should work.
250/second read queries + 10/second aggregation queries will be ok since those are 260 concurrent connections.
Kafka implementation isn't ready for production.
The same path will help with YugabyteDB too. Check the architecture section https://docs.yugabyte.com/latest/architecture/ to get a baseline. In this case it will be a combination of Postgresql knowledge (like the connections part) and YugabyteDB too.
We can continue on this thread, forums or slack with each step in your application development process (schema, sharding, queries, load testing, production debug help etc). |
@alex-gadea please reopen the issue if you have further questions. |
I am looking to gain an understanding of what are the biggest impacts on resource requirements for a YB cluster. Is there any documentation or guidance available that might provide rough guidance on what to take into consideration when calculating node, memory and CPU requirements in a Kubernetes cluster?
I know this is an almost impossible question to answer without real world data, so any type of guidance would be super helpful. Do the number of databases have any type of impact or can I pack as many as memory allows? What is the overhead per database? What is the expected # of concurrent connections per node? What is the memory requirement per connection? Do connections fork ala Postgres? How important is CPU to a YB cluster or is it really memory we have to push? Is it better to have lots of single VCPU nodes or fewer 8 VCPU core nodes (is this largely impacted by whether we go colocated or not due to inter-node traffic)? Is there a point where you expect to reach diminishing returns for each VCPU and its better to expand horizontally instead?
Our expected baseline is:
We have worked with Postgres for a long time and have a good, intuitive understanding of how to architect that environment to meet our needs and a rough idea of what the horsepower we have to bring to play would be. YB is completely new to us and so far we have not been able to find the type of guidelines that would help us understand what our starting point should be prior to moving toward benchmark and load testing so even the simplest guidance would help us at this stage.
The text was updated successfully, but these errors were encountered: