-
Notifications
You must be signed in to change notification settings - Fork 3
Apply compression to common data formats #10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Very interesting ideas! I wonder though how that fits in with our current plan, that iRODS mostly would handle data when stored on SweStore, while when running analyses, one would check out data as normal files (basically since handling data directly, via iRODS is quite cumbersome). I think these are the kind of things for which an IRL meeting could help ... to decide on what we should aim for in these regards. |
Yes I agree. Much what I have thought iRODS can do ! I feel that we really need a design. Direct access vault can bee a good way to come around the iget/iput problem. Then just iput the results from analysis and by metadata associate the result with different input files in iRODS and so on. |
Can you help sketch up the outline of the rules? I can then write a periodic rule to check for files, bundle them and apply compression or just apply compression and then archive. |
We will irsync files that follow those globs to uppmax: Then, a first approach would be to look for uncompressed fastq files within irods (there will be under fastq/ dir) and compress them using gzip, md5summing the resulting file. We want to have easy access to metadata, so the fastq folder (biggest) should be bundled independently from the lightweight metadata files (*.xml, etc..). |
.sam files should not be present for more than X months on the filesystem. Automatic conversion to .bam can be performed.
Likewise, unused .fastq files (for a reasonable amount of time), should be compressed to .gz, which many bioinformatic tools support natively.
The text was updated successfully, but these errors were encountered: