-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine NFS Disk strategy #2815
Comments
Particularly for EECS and genomics hubs, which are using up a lot of space now - we can try setting xfs_quota on them and see how it goes? I think it's already mounted with project quota enabled, which is what we will need to use. |
@yuvipanda Thanks for this detailed write-up. Super useful!
|
@balajialg for (1), EECS hub started out pretty small :) Wether they should get new disks now, or if we should consolidate everything into one should be determined by looking at IOPS metrics as mentioned in the issue. We can definitely have alerts for this once we have alerting mechanisms in place... |
One thing to note here is that google persistent disks have a max capacity of 64TB. Given that I think it's a bad idea to raid the persistent disks together and we cannot shrink disks, I think a disk per hub approach with quotas imposed as necessary makes the most sense. |
We are moving everything to Google Filestore for Spring '23! |
We currently have three NFS disks attached to the NFS server at nfs-server-01.
We had a partial outage earlier because homedir-others ran out of storage space,
and needed to be sized up. This is a bit frustrating, because there was space
in the other disks that we are paying for and not using! So if we figure out a
good strategy for figuring out what disks to use for what hubs, we can save money
and reduce such outages.
What are the advantages of datahub and data100 having their own disk?
was probably caused by eecs hub using up a lot of disk, but affected all other
hubs - except datahub and data100 hub!
disks get consistent performance regardless of how much disk use other hubs
are doing. This might allow us to get away with using much cheaper standard
or balanced disks than more expensive SSD disks.
The primary disadvantage is the lack of pooling - we have unused space that we
pay for in datahub and data100 disks, and that can't really be used by the
other hubs.
Finding a solution to this will really help us provide more stable service while
also minimizing cost.
Approaches to try
each other. This lets us orverprovision as well, so we can utilize space more
effectively.
ZFS.
For both of these, we need to evaluate how real an advantage (2) from above is.
We capture IOPS metrics on our nfs-server, and need to evaluate how much performance
we really need. IOPS metrics are a little hard to understand for me, so I don't have
a clear idea of what we'll lose if we just move to one big maxed out disk.
We also need to have an understanding of wether a single individual user can
hammer our NFS so hard that it just disrupts everyone - NFS is our true
single point of failure.
The text was updated successfully, but these errors were encountered: