-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[EPIC] Automate Data Retrieval Requests #2898
Comments
@yuvipanda my understanding for signed urls is their max duration is 7 days. https://cloud.google.com/storage/docs/gsutil/commands/signurl We'd really need something that could based on user auth produce these on the fly in order to automate. |
ah damn. Yeah, in that case I agree it means we've to write some code here. I don't think it needs us to introduce another layer of auth tho, we can just implement signed URLs ourselves with GCP KMS
|
@yuvipanda It will be fantastic to automate such requests considering the FERPA requirements and the additional throttle on the bandwidth of @felder! How do you see the complexity of writing this service that does this automation? I am conscious of our backlogs and want to avoid adding more requests at your end currently. Let me know! |
@yuvipanda my concern here is that unless the URL obfuscated (not a big fan of security by obscurity though) other students may be able to figure out how to gain access to data they should not be able to gain access to. That's the reasoning behind me saying we may want to tie it to auth of some sort. |
@felder absolutely agree it shouldn't be security by obscurity, it should be fairly strong crypto. I think a simple signature where we keep the key private would be good enough. If people can guess those signed URLs most of the crypto we rely on would be considered broken. Good question on complexity, @balajialg. I'll try investigate that. |
@yuvipanda yeah I wouldn't expect people to guess the signedurls themselves! I'm referring more to the query string parameters that would be used to generate them |
yeah, i think the signing means it doesn't matter what the user can guess. However, I think given my current workload, I won't be able to build this anytime soon. So please don't block on it if other privacy preserving workflow changes need to happen. |
@yuvipanda @felder We have a couple of options in the short term,
I am inclining towards 2 for such requests alone. What do you both think? |
@balajialg I'm inclined toward 2 as well. However, I do not believe these requests should be considered in a vacuum. We may opt to move these requests first, but ultimately we should consider it as a trial run for a general support process for Berkeley specific operational issues. |
@felder When you mean support process, you mean for the regular requests we get right? Package requests, admin access, RAM elevation, etc.. or are you also considering bugs being reported? If it is a bug, I wonder how issues such as this would be fixed as they have an upstream dependency and would require interaction with other developers/admins! Lets discuss more during sprint planning meeting (Lets see whether we will be able to wrap this discussion in time) |
I think it might be helpful to have something else that contains possible private information - but I'd love for most things to stay as public as possible here. |
@balajialg @yuvipanda Anything reported by a student or regarding a specific student where FERPA would apply. Basically I'd like to start thinking about datahub the UCB specific service vs datahub the opensource software project (not to be confused with datahub the proposed building 😃), with service related requests having a private ticketing system. Note that requests that require development resources to resolve can have github issues created for them. I understand that transparency is important, but I do think there are plenty of support requests that don't really require any development resources to fix and probably would not be of that much interest to anyone else. Individual issues regarding say rstudio not launching would fall into this as well, as opposed to generalized solutions such as terminating rstudio gracefully on logout which would remain here in github. |
@felder Got it! I wonder whether reporting bugs through different systems (based on the nature of the bugs) will be a cumbersome support experience for the users as most users would not care to know whether their issue should be raised via Github or a ticketing system based on the nature of the bug. For eg: The rstudio usecase highlighted by you. I am personally aligned with moving chores to a support system (if that is something you feel strongly about) but keeping the feature enhancements and issues being reported (since many issues are correlated with package requests) with Github considering that they may require upstream dependency. Thoughts? @felder @yuvipanda Did some analysis on the distribution of requests that we get every month. This is how it looks like for the past three months, August: September: October: Based on the frequency and volume, the routine support requests that really matter are the "package installation/upgrade" and the "retrieval of the file" requests. |
It might be useful to create a service that is proxied by the user's server which can generate these URLs or invoke various APIs. For example it could be a tornado/flask app that runs on a random port in the user's pod and is proxied by jupyter-server-proxy. It would be behind the hub's authentication. There could be a launcher in retro's I'm not sure about the full details of the signed URL and retrieval process, so this idea might require iteration. |
Currently, students need to raise a request on GitHub to get a copy of their archived files (#2866). This also creates manual work for @felder, and there are also way more requests than I had thought.
In the text file telling students what to do, we ask them to open this issue here. Instead, we can just provide the signed URL automatically there - so they can self-serve themselves the files without having to bother us. This eliminates an entire class of service requests we need to handle.
The text was updated successfully, but these errors were encountered: