Set a default for "Zarr dir" #1934

lorenzocerrone · 2024-10-16T11:36:10Z

Since the user groups have a project directory assigned, setting a reasonable default when people create a dataset would be great.
Something like this:

This would be great because many users found this very cumbersome to fill.

Some ideas for default values:

/shares/{project_name}/{user_name}/{dataset_name}_{dataset_id}
/shares/{project_name}/{user_name}/datasets/{dataset_name}_{dataset_id}
/shares/{project_name}/datasets/{user_name}/{dataset_name}_{dataset_id}

I don't have any strong opinions on them, but it would be good if we picked something unlikely to change.
@tcompa @jluethi

tcompa · 2024-10-16T12:13:33Z

I can definitely see the use for this feature (which probably would live frontend-side rather than here - I'm now transferring the issue). We'd still need to define it a bit better:

A user can be associated to 1 or N user groups - which one takes priority?
Each user group can be associated to 0, 1 or N paths - which one takes priority?
There is no (required) username associated to users (at least for the moment). What do we do if username=null?
For the moment the dataset name can be edited by the user.

Once we have these or similar questions defined, we can come up with a "hint". Additional details:

The hint could be either a pre-filled text field or it could be part of an info message like "E.g. you can use /something/example/....
And the feature should fail gracefully if the hint is not found: e.g. if my user groups are not associated to any path, I simply won't see any hint.

I don't have any strong opinions on them, but it would be good if we picked something unlikely to change.

Several attributes are stable only in practice but not immutable by design: user/usergroup associations, usergroup paths, dataset names. If we aim at guaranteeing that these folders won't change, then we'll need to introduce an appropriate attribute like user.project_folder (and I guess this will be per-user, rather than per-group).

tcompa · 2024-10-16T12:14:10Z

(keeping the issue backend-side in case we want to introduce a new database column in user settings, otherwise I think it will be handled client-side)

jluethi · 2024-10-16T12:14:34Z

I'm a big fan of adding something like this when we have a good default we know for users. While direct access to pick a directory is harder and will take a while to get to, this we can do earlier on.

Since the user groups have a project directory assigned

That would be their Vizarr allowed base directory. Maybe we should consider renaming it to remove the Vizarr part, because we may use it as a default base path, and we may use it to stream data to napari eventually :) What is it called atm @tcompa ? It's shown as "Viewer paths" in the group management

Re what our default should be:
For the science cloud beta deployment, I'd suggest:
$Viewer_paths[0]/{user_name}/fractal/{dataset_id}_{dataset_name}

Things to consider:

Users could have multiple viewer_paths. Let's use the first path from the first group they are assigned to as their default
Handling whether viewer paths end in / or not
Only doing this when a viewer_paths variable is set
Let's add a fractal folder in between => if user also manually add stuff to their folder in the cluster, they would show up somewhere else
Order-wise, I'd go dataset_id first, because then it may just order by time? I think users rarely delete their datasets. Do we actually reuse lower IDs when they become available? If yes, there is a small potential for a conflict with an existing folder.

We may only do that for the Science Cluster or come up with a way that would also work for other deployments.

For UZH Pelkmans lab:
I'd want to achieve something like
/data/active/{slurm_user}/fractal/{dataset_id}_{dataset_name}

FMI Liberali lab:
I'd want to achieve something like /base_path/{slurm_user}/fractal/{dataset_id}_{dataset_name}

Where the base path is the same for all users

jluethi · 2024-10-16T12:18:32Z

Very good questions Tommaso!

A user can be associated to 1 or N user groups - which one takes priority?

I'd say their first group

Each user group can be associated to 0, 1 or N paths - which one takes priority?

0 paths => no default shown (or maybe next group checked??)

There is no (required) username associated to users (at least for the moment). What do we do if username=null?

Great question! Maybe this would be a case for where the username starts to become relevant

For the moment the dataset name can be edited by the user.

That seems fine to me. We're not forcing data to go there, we just provide a default. They can put it anywhere or change the dataset name later without changing the paths.

Re hint vs default: If we can make it work well enough, I'd strongly prefer a default

Several attributes are stable only in practice but not immutable by design: user/usergroup associations, usergroup paths, dataset names. If we aim at guaranteeing that these folders won't change, then we'll need to introduce an appropriate attribute like user.project_folder (and I guess this will be per-user, rather than per-group).

That's an interesting idea as well to avoid a lot of heuristics above and make it work for different deployments.
In general, I'm not worried about whether the default proposal changes over time, whether a user would get the same base path a month later etc.
But more that we suggest a reasonable default to the user

tcompa · 2024-10-18T08:49:53Z

An additional detail here: We cannot naively rely on the dataset ID, since this is only known after the dataset is created.

This would be no problem if we decided (unlikely) to defer the folder choice to the backend. In that case, I'd suggest we define the proper database attributes (user.base_zarr_folder, required) and rules (e.g. anything like base_folder + "dataset_id" + "dataset_name") and proceed with this.
It's technically easy to also expose the two options in the API: trust the server (they will pick a default zarr-dir for you, but you can't know it in advance) or give your own path.
It's not clear at the moment how we could use this rule as part of a "hint", namely something that is known frontend-side but can still be modified by the user.

I don't have a clear solution in mind right now - TBD

jluethi · 2024-10-18T12:02:11Z

An additional detail here: We cannot naively rely on the dataset ID, since this is only known after the dataset is created.

A, good catch! True, that makes it trickier. Let's discuss at the Monday call whether we find a good direction for this

jluethi · 2024-10-18T12:03:32Z

It's technically easy to also expose the two options in the API: trust the server (they will pick a default zarr-dir for you, but you can't know it in advance) or give your own path.

This is something we can consider from my perspective. The result is: Setting a zarr_dir becomes optional, because the server will have a way to create one for you if you did not do it

jluethi · 2024-10-21T09:12:08Z

Discussion results

Reusing IDs: Not intended to happen => @tcompa will double check

Goal: Users do not need to specify a zarr_dir by default. If they don't set one, Fractal creates it.

It would get put into this folder:
Favored: user.project_folder/fractal/{project_id}{project_name}/{dataset_id}{dataset_name}
user.project_folder/fractal/{dataset_id}_{dataset_name}

user.project_folder sounds like a very useful thing to have:

It should be user owned
It could be used in all deployments
It will likely be optional. If unset => can't create standard dataset folders etc.
If it is set, we could create a unique subfolder in user.project_folder

What would user.project_folder get set to:

Science Cluster

/shares/project_share/username

Pelkmans lab

/data/active/slurm_user

Liberali lab

/path/to/gliberal/Users/slurm_user

Open questions:

Project sharing? Users may want to specify their own path in those cases, deployment dependent. Write access to a shared folder will get tricky => let's use documentation for this for the time being
How does it match with the user_group.viewer_paths? These are per user group. Do we add the user.project_folder to the things a user has access to? Some potential complexities in whether users would have access to all their zarr_dirs
Can the user change the user.project_folder? Gets tricky on service-user setups on what we allow => let's not expose this at least in the beginning. Admins set them up
How does it interact with the cache directory? => use project_folder as the cache directory as well

tcompa · 2024-10-30T11:20:58Z

Reusing IDs: Not intended to happen => to check/test
zarr_dir becomes optional in request body
If the endpoint receives a request with zarr_dir, there should be a single db.commit.
If the endpoint receives a request without zarr_dir and user_settgins.project_dir is unset -> 422
If the endpoint receives a request without zarr_dir and user_settgins.project_dir is set -> create the zarr dir attribute server-side. This requires two db.commits. The first one sets zarr_dir="PLACEHOLDER", and the second one to

{user.project_folder}/fractal/{project_id}_{project_name}/{dataset_id}_{dataset_name}

Open fractal-vizarr-viewer issue about

How does it match with the user_group.viewer_paths? These are per user group. Do we add the user.project_folder to the things a user has access to? Some potential complexities in whether users would have access to all their zarr_dirs

fractal-analytics-platform/fractal-vizarr-viewer#44

Open fractal-server issue about

How does it interact with the cache directory? => use project_folder as the cache directory as well

tcompa · 2024-10-30T14:23:10Z

cc @jluethi

Reusing IDs: Not intended to happen => @tcompa will double check

@ychiucco checked this, and it's only true with postgres. With sqlite, IDs of deleted objects can be re-used (see #1991), which could lead to duplicate folders. We could not rapidly find a way to change the sqlite configuration.

The way we are proceeding for the moment is to just disable this option on sqlite instances. When using sqlite, you would only be able to create a dataset if you specify zarr_dir.

For the record, this change comes together with writing tests that depend on the specific database that is being used. And none of the two changes adds any value our to active (postgres-based) instances.

jluethi · 2024-11-01T14:19:30Z

Thanks for checking. Another strong reason to drop sqlite then.

If this is too much effort to write for sqlite, we can accept the edge-case of mistakenly reusing folders in sqlite then until we deprecate sqlite.

lorenzocerrone added flexibility Support more workflow-execution use cases low-hanging-fruit labels Oct 16, 2024

jluethi added this to Fractal Project Management Oct 16, 2024

github-project-automation bot moved this to TODO in Fractal Project Management Oct 16, 2024

tcompa mentioned this issue Oct 21, 2024

Review use cases for user's username field #1940

Closed

ychiucco self-assigned this Oct 28, 2024

This was referenced Oct 30, 2024

Introduce user-setting column project_dir #1986

Closed

To review: do we assume that the parent directory of dataset.zarr_dir exists? #1987

Closed

ychiucco linked a pull request Oct 30, 2024 that will close this issue

Set a default for zarr_dir #1990

Merged

4 tasks

ychiucco mentioned this issue Oct 30, 2024

Set a default for zarr_dir #1990

Merged

4 tasks

tcompa mentioned this issue Oct 30, 2024

Review missing ID autoincrement for sqlite #1991

Closed

This was referenced Oct 30, 2024

Include user_settings.project_dir in list of allowed paths, on top of usergroup.viewer_paths fractal-analytics-platform/fractal-vizarr-viewer#44

Closed

Remove cache_dir and use project_dir/.fractal_cache #1992

Closed

tcompa closed this as completed in #1990 Nov 4, 2024

github-project-automation bot moved this from TODO to Done in Fractal Project Management Nov 4, 2024

tcompa added the sqlite label Nov 14, 2024

tcompa removed the flexibility Support more workflow-execution use cases label Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set a default for "Zarr dir" #1934

Set a default for "Zarr dir" #1934

lorenzocerrone commented Oct 16, 2024

tcompa commented Oct 16, 2024

tcompa commented Oct 16, 2024

jluethi commented Oct 16, 2024

jluethi commented Oct 16, 2024

tcompa commented Oct 18, 2024

jluethi commented Oct 18, 2024

jluethi commented Oct 18, 2024

jluethi commented Oct 21, 2024

tcompa commented Oct 30, 2024 •

edited by ychiucco

Loading

tcompa commented Oct 30, 2024

jluethi commented Nov 1, 2024

Set a default for "Zarr dir" #1934

Set a default for "Zarr dir" #1934

Comments

lorenzocerrone commented Oct 16, 2024

tcompa commented Oct 16, 2024

tcompa commented Oct 16, 2024

jluethi commented Oct 16, 2024

jluethi commented Oct 16, 2024

tcompa commented Oct 18, 2024

jluethi commented Oct 18, 2024

jluethi commented Oct 18, 2024

jluethi commented Oct 21, 2024

Discussion results

Science Cluster

Pelkmans lab

Liberali lab

tcompa commented Oct 30, 2024 • edited by ychiucco Loading

tcompa commented Oct 30, 2024

jluethi commented Nov 1, 2024

tcompa commented Oct 30, 2024 •

edited by ychiucco

Loading