-
Notifications
You must be signed in to change notification settings - Fork 659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix extended glob #2182
Fix extended glob #2182
Conversation
Signed-off-by: Lehmann-Fabian <[email protected]>
…proposed in PR nextflow-io#2035. Signed-off-by: Lehmann-Fabian <[email protected]>
Signed-off-by: Lehmann-Fabian <[email protected]>
…extflow-io#2035 Signed-off-by: Lehmann-Fabian <[email protected]>
…io#2035 Signed-off-by: Lehmann-Fabian <[email protected]>
Signed-off-by: Lehmann-Fabian <[email protected]>
Signed-off-by: Lehmann-Fabian <[email protected]>
I'm not sure why it is failing. |
pathes=`ls -1d ${escape.join(' ')} | sort | uniq` | ||
shopt -u globstar extglob || true | ||
set -f | ||
for name in \$pathes; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is this prevents the wildcards expansion. Why is needed? they should have already been resolved by the ls -1d
command
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, set -f
prevents it. This is needed, if file names contain a special character like a question mark, which would otherwise be resolved again. There is one test case which fails otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, not fully understanding this. Can you provide a minimal example NF script showing how this works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, the following process should only find one file - the one with a question mark in the file name.
process a {
echo true
output:
file("ab\\?.txt") into found
"""
touch abc.txt
touch ab\\?.txt
ls
"""
}
found.view()
Don't disable wildcard expansion would lead to a new expansion in the for loop and once again in the copy/move method. Accordingly, both files will be moved/copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @pditommaso, I’m not sure whether this helped you understand my contribution or you need an additional example. I look forward to your feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a problem can come from the fact that NF expects the use of the glob:
option to switch off the matching of the ?
special character. My tests:
without glon option
process a {
output:
path "ab?.txt" into found
"""
touch abc.txt
touch ab\\?.txt
ls
"""
}
found.flatten().view()
shell:
» nextflow run test.nf
N E X T F L O W ~ version 21.09.0-SNAPSHOT
Launching `test.nf` [elated_nobel] - revision: 198b348c2b
executor > local (1)
[23/6a07e6] process > a [100%] 1 of 1 ✔
/Users/pditommaso/Projects/nextflow/work/23/6a07e6c49fe17b96cab4c476a2b83f/ab?.txt
/Users/pditommaso/Projects/nextflow/work/23/6a07e6c49fe17b96cab4c476a2b83f/abc.txt
OK
using glob option
process a {
output:
path "ab?.txt", glob: false into found
"""
touch abc.txt
touch ab\\?.txt
ls
"""
}
found.flatten().view()
Shell:
» nextflow run test.nf
N E X T F L O W ~ version 21.09.0-SNAPSHOT
Launching `test.nf` [curious_brown] - revision: 4cae485aee
executor > local (1)
[56/bda665] process > a [100%] 1 of 1 ✔
/Users/pditommaso/Projects/nextflow/work/56/bda665ba766227bbecf15e8611cbf6/ab?.txt
OK
glob option + enable scratch option
process a {
output:
path "ab?.txt", glob: false into found
"""
touch abc.txt
touch ab\\?.txt
ls
"""
}
found.flatten().view()
Shell command:
» nextflow run test.nf -process.scratch true
N E X T F L O W ~ version 21.09.0-SNAPSHOT
Launching `test.nf` [high_minsky] - revision: 4cae485aee
executor > local (1)
[82/ef2225] process > a [100%] 1 of 1 ✔
/Users/pditommaso/Projects/nextflow/work/82/ef22257c5eb503b4f57e993c8aca51/ab?.txt
OK, though
Work dir:
» ls -lh work/82/ef22257c5eb503b4f57e993c8aca51
total 0
-rw-r--r-- 1 pditommaso staff 0B Oct 6 16:28 ab?.txt
-rw-r--r-- 1 pditommaso staff 0B Oct 6 16:28 abc.txt
I think your concern is the above, that since the ab?.txt
was requested, both files were copied from the task scratch copied. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. My concern is that the file is copied/moved anyways. That is why I prevent the second wildcard expansion since the absolute file paths are known then.
a) If you do not see a problem, we can delete the set -f
. But this will generate more I/O.
Anyway, I see my commit does not consider the option glob: false
. In that case, we should also say set -f
instead of shopt -s globstar extglob || true
to activate the glob.
b) How do you think about that?
If you agree to both, I will change a and b as proposed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two problems: 1) handling correctly glob:false
2) handle **
wildcard
For 1, I think it could be addressed replacing Escape.path
with Escape.wildcard
when glob==false
here
nextflow/modules/nextflow/src/main/groovy/nextflow/executor/SimpleFileCopyStrategy.groovy
Line 171 in c405f9e
escape.add( Escape.path(it) ) |
For 2, I think it would be preferable to keep eval subshell, I mean having
for name in \$(eval "ls -1d ${escape.join(' ')}" | sort | uniq); do
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me start with 2).
I found the problem you meant.
There was a problem if the **
was within paths like a/dir**file.txt
, but I fixed it in the last commit.
I prepared 4 test cases and tested them three times each.
- with the current Nextflow version
- with the current Nextflow version and
-process.scratch true
- with my fixed Nextflow version and
-process.scratch true
For portability, the second and third test case should find the same files as the first one. However, the second and third should only copy the files found.
Testcase 1
process a {
output:
path "{a,b,c}/*" into found
"""
mkdir a b c d
touch file.txt
touch a/file.txt
touch c/file.txt
touch d/file.txt
ls
"""
}
found.flatten().view()
1.
$ ./nextflow run test1.nf
N E X T F L O W ~ version 21.04.3
Launching `test1.nf` [lonely_sanger] - revision: 06c25e47e3
executor > local (1)
[21/781741] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/21/781741acd0b0369ababfe7c881ebc1/a/file.txt
/home/fabian/extendedGlob/work/21/781741acd0b0369ababfe7c881ebc1/c/file.txt
Files in workdir:
work/21/781741acd0b0369ababfe7c881ebc1/
├── a
│ └── file.txt
├── b
├── c
│ └── file.txt
├── d
│ └── file.txt
└── file.txt
4 directories, 4 files
2.
$ ./nextflow run test1.nf -process.scratch true
N E X T F L O W ~ version 21.04.3
Launching `test1.nf` [intergalactic_dijkstra] - revision: 06c25e47e3
executor > local (1)
[5b/b09417] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/5b/b09417152861aa02b76b6b756f92ae/c/a
/home/fabian/extendedGlob/work/5b/b09417152861aa02b76b6b756f92ae/c/b
/home/fabian/extendedGlob/work/5b/b09417152861aa02b76b6b756f92ae/c/file.txt
Files in workdir:
work/5b/b09417152861aa02b76b6b756f92ae/
├── a
├── b
└── c
├── a
├── b
└── file.txt
5 directories, 1 file
For the current Nextflow version, the files are copied into the wrong hierarchy. Accordingly, the wrong and not all files are found.
3.
$ ./nextflow-fix run test1.nf -process.scratch true
N E X T F L O W ~ version 21.09.0-SNAPSHOT
Launching `test1.nf` [gigantic_stone] - revision: 06c25e47e3
executor > local (1)
[89/51d3b7] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/89/51d3b76e56ae498416cc307209a3aa/a/file.txt
/home/fabian/extendedGlob/work/89/51d3b76e56ae498416cc307209a3aa/c/file.txt
Files in workdir:
work/89/51d3b76e56ae498416cc307209a3aa/
├── a
│ └── file.txt
└── c
└── file.txt
2 directories, 2 files
Testcase 2
process a {
output:
path "a/*/c/*" into found
"""
mkdir -p a/b/c/ a/b/d/c/
touch a/b/c/file.txt
touch a/b/c/file2.txt
touch a/b/d/c/file2.txt
touch a/b/file.txt
touch a/file.txt
touch file.txt
ls
"""
}
found.flatten().view()
1.
$ ./nextflow run test2.nf
N E X T F L O W ~ version 21.04.3
Launching `test2.nf` [modest_venter] - revision: 53280a4646
executor > local (1)
[3f/eba9de] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/3f/eba9de27063e039e2d4e5351f15817/a/b/c/file.txt
/home/fabian/extendedGlob/work/3f/eba9de27063e039e2d4e5351f15817/a/b/c/file2.txt
Files in workdir:
work/3f/eba9de27063e039e2d4e5351f15817/
├── a
│ ├── b
│ │ ├── c
│ │ │ ├── file2.txt
│ │ │ └── file.txt
│ │ ├── d
│ │ │ └── c
│ │ │ └── file2.txt
│ │ └── file.txt
│ └── file.txt
└── file.txt
5 directories, 6 files
2.
$ ./nextflow run test2.nf -process.scratch true
N E X T F L O W ~ version 21.04.3
Launching `test2.nf` [romantic_dijkstra] - revision: 53280a4646
executor > local (1)
[23/f55e4c] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/23/f55e4c82bf437bc047c8b670c26457/a/*/c/file.txt
/home/fabian/extendedGlob/work/23/f55e4c82bf437bc047c8b670c26457/a/*/c/file2.txt
Files in workdir:
work/23/f55e4c82bf437bc047c8b670c26457/
└── a
└── *
└── c
├── file2.txt
└── file.txt
3 directories, 2 files
This time the *
is somehow used as the output file name. Accordingly, some files are overwritten, and not all files are found in the end.
3.
$ ./nextflow-fix run test2.nf -process.scratch true
N E X T F L O W ~ version 21.09.0-SNAPSHOT
Launching `test2.nf` [festering_jang] - revision: 53280a4646
executor > local (1)
[8c/cfc9ff] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/8c/cfc9ffcc0949937b0982ea120e5303/a/b/c/file.txt
/home/fabian/extendedGlob/work/8c/cfc9ffcc0949937b0982ea120e5303/a/b/c/file2.txt
Files in workdir:
work/8c/cfc9ffcc0949937b0982ea120e5303/
└── a
└── b
└── c
├── file2.txt
└── file.txt
3 directories, 2 files
Testcase 3
process a {
output:
path "a/dir*/file.txt" into found
"""
mkdir -p a/dir1/a a/dir2
touch a/dir1/file.txt
touch a/dir2/file.txt
touch a/dir1/a/file.txt
ls
"""
}
found.flatten().view()
1.
$ ./nextflow run test3.nf
N E X T F L O W ~ version 21.04.3
Launching `test3.nf` [gigantic_hodgkin] - revision: 3c4fcfe408
executor > local (1)
[3b/88992b] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/3b/88992b05421a86ecae9cdee5d415d8/a/dir1/file.txt
/home/fabian/extendedGlob/work/3b/88992b05421a86ecae9cdee5d415d8/a/dir2/file.txt
Files in workdir:
work/3b/88992b05421a86ecae9cdee5d415d8/
└── a
├── dir1
│ ├── a
│ │ └── file.txt
│ └── file.txt
└── dir2
└── file.txt
4 directories, 3 files
2.
$ ./nextflow run test3.nf -process.scratch true
N E X T F L O W ~ version 21.04.3
Launching `test3.nf` [magical_rubens] - revision: 3c4fcfe408
executor > local (1)
[f2/adbebd] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/f2/adbebd320ec29728872033ebb09bf4/a/dir*/file.txt
Files in workdir:
work/f2/adbebd320ec29728872033ebb09bf4/
└── a
└── dir*
└── file.txt
2 directories, 1 file
More or less the same behavior as for test case 2.
3.
$ ./nextflow-fix run test3.nf -process.scratch true
N E X T F L O W ~ version 21.09.0-SNAPSHOT
Launching `test3.nf` [admiring_blackwell] - revision: 3c4fcfe408
executor > local (1)
[14/06cc88] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/14/06cc88cf25a6e76ec18ef9789c3478/a/dir1/file.txt
/home/fabian/extendedGlob/work/14/06cc88cf25a6e76ec18ef9789c3478/a/dir2/file.txt
Files in workdir:
work/14/06cc88cf25a6e76ec18ef9789c3478/
└── a
├── dir1
│ └── file.txt
└── dir2
└── file.txt
3 directories, 2 files
Testcase 4
process a {
output:
path "a/dir**/file.txt" into found
"""
mkdir -p a/dir1/a/b/ a/dir2 a/dir
touch a/dir/file.txt
touch a/dir1/file.txt
touch a/dir2/file.txt
touch a/dir1/a/file.txt
touch a/dir1/a/b/file.txt
ls
"""
}
found.flatten().view()
1.
$ ./nextflow run test4.nf
N E X T F L O W ~ version 21.04.3
Launching `test4.nf` [lonely_descartes] - revision: 623cbb1c8b
executor > local (1)
[f5/36c1a7] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/f5/36c1a762b733e32d9e1bf20cb66780/a/dir/file.txt
/home/fabian/extendedGlob/work/f5/36c1a762b733e32d9e1bf20cb66780/a/dir1/a/b/file.txt
/home/fabian/extendedGlob/work/f5/36c1a762b733e32d9e1bf20cb66780/a/dir1/a/file.txt
/home/fabian/extendedGlob/work/f5/36c1a762b733e32d9e1bf20cb66780/a/dir1/file.txt
/home/fabian/extendedGlob/work/f5/36c1a762b733e32d9e1bf20cb66780/a/dir2/file.txt
Files in workdir:
work/f5/36c1a762b733e32d9e1bf20cb66780/
└── a
├── dir
│ └── file.txt
├── dir1
│ ├── a
│ │ ├── b
│ │ │ └── file.txt
│ │ └── file.txt
│ └── file.txt
└── dir2
└── file.txt
6 directories, 5 files
2.
$ ./nextflow run test4.nf -process.scratch true
N E X T F L O W ~ version 21.04.3
Launching `test4.nf` [insane_rosalind] - revision: 623cbb1c8b
executor > local (1)
[2f/b19602] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/2f/b19602d56e03194db570eb45e697bd/a/dir/file.txt
/home/fabian/extendedGlob/work/2f/b19602d56e03194db570eb45e697bd/a/dir1/a/b/file.txt
/home/fabian/extendedGlob/work/2f/b19602d56e03194db570eb45e697bd/a/dir1/a/file.txt
/home/fabian/extendedGlob/work/2f/b19602d56e03194db570eb45e697bd/a/dir1/file.txt
/home/fabian/extendedGlob/work/2f/b19602d56e03194db570eb45e697bd/a/dir2/file.txt
Files in workdir:
work/2f/b19602d56e03194db570eb45e697bd/
└── a
├── dir
│ └── file.txt
├── dir1
│ ├── a
│ │ ├── b
│ │ │ └── file.txt
│ │ └── file.txt
│ └── file.txt
└── dir2
└── file.txt
6 directories, 5 files
3.
$ ./nextflow-fix run test4.nf -process.scratch true
N E X T F L O W ~ version 21.09.0-SNAPSHOT
Launching `test4.nf` [happy_golick] - revision: 623cbb1c8b
executor > local (1)
[ba/6f94f6] process > a [100%] 1 of 1 ✔
/home/fabian/extendedGlob/work/ba/6f94f6ff172a2cd39f4deeea0d42fd/a/dir/file.txt
/home/fabian/extendedGlob/work/ba/6f94f6ff172a2cd39f4deeea0d42fd/a/dir1/a/b/file.txt
/home/fabian/extendedGlob/work/ba/6f94f6ff172a2cd39f4deeea0d42fd/a/dir1/a/file.txt
/home/fabian/extendedGlob/work/ba/6f94f6ff172a2cd39f4deeea0d42fd/a/dir1/file.txt
/home/fabian/extendedGlob/work/ba/6f94f6ff172a2cd39f4deeea0d42fd/a/dir2/file.txt
Files in workdir:
work/ba/6f94f6ff172a2cd39f4deeea0d42fd/
└── a
├── dir
│ └── file.txt
├── dir1
│ ├── a
│ │ ├── b
│ │ │ └── file.txt
│ │ └── file.txt
│ └── file.txt
└── dir2
└── file.txt
6 directories, 5 files
For ** it also works in my fixed version.
From my tests, you can see that my implementation is consistent with the behavior running without a scratch dir.
I implemented many more test cases here
nextflow/modules/nextflow/src/test/groovy/nextflow/executor/SimpleFileCopyStrategyTest.groovy
Lines 232 to 271 in b9edc92
source | inputffiles | outputfiles | |
'**.txt' | ['file.txt', 'file2.txt', 'file3.txta', 'a/file.txt', 'a/file3.txta', 'b/file.txt'] | ['file.txt', 'file2.txt', 'a/file.txt', 'b/file.txt'] | |
'**.foo' | ['file.txt', 'file2.txt', 'file3.txta', 'a/file.txt', 'a/file3.txta', 'b/file.txt'] | [] | |
'*.txt' | ['file.txt', 'file2.txt', 'file3.txta'] | ['file.txt', 'file2.txt'] | |
'*' | ['file.txt', 'file2.txt', 'file3.txta', 'a/b.txt'] | ['file.txt', 'file2.txt', 'file3.txta', 'a/b.txt'] | |
'**' | ['file.txt', 'file2.txt', 'file3.txta', 'a/b.txt'] | ['file.txt', 'file2.txt', 'file3.txta', 'a/b.txt'] | |
'a/*' | ['a/file.txt', 'a/file2.txt', 'a/b/a.txt', 'a/d/', 'file.txt'] | ['a/file.txt', 'a/file2.txt', 'a/b/a.txt', 'a/d/'] | |
'a/' | ['a/file.txt', 'a/file2.txt', 'a/b/a.txt', 'a/d/', 'file.txt'] | ['a/file.txt', 'a/file2.txt', 'a/b/a.txt', 'a/d/'] | |
'a' | ['a/file.txt', 'a/file2.txt', 'a/b/a.txt', 'a/d/', 'file.txt'] | ['a/file.txt', 'a/file2.txt', 'a/b/a.txt', 'a/d/'] | |
'a b/*' | ['a b/file.txt', 'a b/file2.txt', 'a b/b/a.txt', 'a b/d/', 'file.txt'] | ['a b/file.txt', 'a b/file2.txt', 'a b/b/a.txt', 'a b/d/'] | |
'a b/' | ['a b/file.txt', 'a b/file2.txt', 'a b/b/a.txt', 'a b/d/', 'file.txt'] | ['a b/file.txt', 'a b/file2.txt', 'a b/b/a.txt', 'a b/d/'] | |
'a b' | ['a b/file.txt', 'a b/file2.txt', 'a b/b/a.txt', 'a b/d/', 'file.txt'] | ['a b/file.txt', 'a b/file2.txt', 'a b/b/a.txt', 'a b/d/'] | |
'a|b/*' | ['a|b/file.txt', 'a|b/file2.txt', 'a|b/b/a.txt', 'a|b/d/', 'file.txt'] | ['a|b/file.txt', 'a|b/file2.txt', 'a|b/b/a.txt', 'a|b/d/'] | |
'a/*/c/*' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt'] | |
'a/**/c/*' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt'] | |
'a/*/c/' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt'] | |
'a/**/c/' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt'] | |
'a/*/c' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt'] | |
'a/**/c' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt'] | |
'a/dir*/file.txt' | ['a/dir/file.txt', 'a/dir1/file.txt', 'a/dir2/afile.txt', 'a/dir1/a/file.txt'] | ['a/dir/file.txt', 'a/dir1/file.txt'] | |
'a/dir**/file.txt' | ['a/dir/file.txt', 'a/dir1/file.txt', 'a/dir2/afile.txt', 'a/dir1/a/file.txt', 'a/dir1/a/b/file.txt'] | ['a/dir/file.txt', 'a/dir1/file.txt', 'a/dir1/a/file.txt', 'a/dir1/a/b/file.txt'] | |
'a/dir**file.txt' | ['a/dirafile.txt', 'a/dir/file.txt', 'a/dir1/file.txt', 'a/dir2/afile.txt', 'a/dir1/a/file.txt', 'a/dir1/a/b/file.txt'] | ['a/dirafile.txt', 'a/dir/file.txt', 'a/dir1/file.txt', 'a/dir2/afile.txt', 'a/dir1/a/file.txt', 'a/dir1/a/b/file.txt'] | |
'a/?/c/*' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt'] | |
'a/?/c/' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt'] | |
'a/?/c' | ['a/b/c/file.txt', 'a/b/c/file2.txt', 'a/b/d/c/file2.txt', 'a/b/file.txt', 'a/file.txt', 'file.txt'] | ['a/b/c/file.txt', 'a/b/c/file2.txt'] | |
'{a,b,c}/*' | ['a/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['a/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt'] | |
'{a,b,c}/' | ['a/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['a/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt'] | |
'{a,b,c}' | ['a/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['a/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt'] | |
'{ab*,b*/*,c}/*' | ['a/file.txt', 'abcd/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['abcd/file2.txt'] | |
'{a[ab]c,b*/*}/*' | ['acc/file.txt', 'abc/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['abc/file2.txt'] | |
'{a|c*,b*/*}/*' | ['a|c/file.txt', 'abc/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['a|c/file.txt'] | |
'[A-Z]/*' | ['A/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['A/file.txt'] | |
'[A-Z]/' | ['A/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['A/file.txt'] | |
'[A-Z]' | ['A/file.txt', 'a/file2.txt', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['A/file.txt'] | |
'a' | ['A/file.txt', 'a/', 'b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['a/'] | |
'a/b/c' | ['A/file.txt', 'a/b/c/', 'a/b/file.txt', 'b/file2.txt', 'd/file.txt', 'file.txt'] | ['a/b/c/'] | |
'abc?\\?.txt' | ['abcde.txt', 'abcd.txt', 'abcd?.txt'] | ['abcd?.txt'] | |
'a"b.txt' | ['a\\"b.txt', 'a"b.txt', 'abc.txt'] | ['a"b.txt'] | |
'a/**/b/*/d.txt' | ['a/c/d/b/x/d.txt', 'a/c/d/b/x/Y/d.txt'] | ['a/c/d/b/x/d.txt'] | |
'a/bc*h/ij*mn/*' | ['a/bcdefgh/ijklmn/a.txt', 'a/bcdfgh/ijklmn/a.txt', 'a/bcdefghi/ijklmn/a.txt'] | ['a/bcdefgh/ijklmn/a.txt', 'a/bcdfgh/ijklmn/a.txt'] |
The tests check if all files are found, but not more.
Let us talk about 1): How to deal with glob:false
.
I have not yet implemented it.
Have you a good idea how to get the information whether an output pattern has to be treated as glob:false
here:
nextflow/modules/nextflow/src/main/groovy/nextflow/executor/SimpleFileCopyStrategy.groovy
Line 161 in b9edc92
String getUnstageOutputFilesScript(List<String> outputFiles, Path targetDir) { |
My idea would be to change this:
nextflow/modules/nextflow/src/main/groovy/nextflow/processor/TaskRun.groovy
Lines 426 to 434 in b9edc92
List<String> getOutputFilesNames() { | |
def result = [] | |
for( FileOutParam param : getOutputsByType(FileOutParam).keySet() ) { | |
result.addAll( param.getFilePatterns(context, workDir) ) | |
} | |
return result.unique() | |
} |
to:
List<Tuple2<String, Boolean>> getOutputFilesNames() {
def result = []
for( FileOutParam param : getOutputsByType(FileOutParam).keySet() ) {
result.addAll( new Tuple2(param.getFilePatterns(context, workDir), param.glob) )
}
return result.unique()
}
and adjust all calls accordingly.
But I think there is a better way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commenting in the main thread to have more space
30b2ef4
to
cc77472
Compare
Signed-off-by: Lehmann-Fabian <[email protected]>
I think most (all) of the problems you are reporting using the current master version (think 21.08.0-edge) is also ok. I've repeated the tests for your convenience Test case 11. plain
OK 2. with scratch
OK Test case 21. plain
2. with scratch
OK Testcase 31. plain
OK 2. with scracth
Testcase 41. plain
2. with scratch
Also when using |
Regarding |
Ahh, somehow, Nextflow did not execute the latest version, even when I downloaded it. Testcase 1
1.
Files in workdir:
2.
Files in workdir:
Here to much data is copied. This could be problematic, as it could be dozens of gigabytes of temporary data. My solution only copies the required files: 3.
Files in workdir:
Testcase 2
1.
Files in workdir:
2.
Files in workdir:
In this case the wrong file is copied, while my solution finds it: 3.
Files in workdir:
Testcase 3
1.
Files in workdir:
2.
Files in workdir:
Again, the file is not copied and accordingly the pipeline crashed. My solution worked well: 3.
Files in workdir:
|
6564123
to
c78965b
Compare
regarding the first, should it traverse dirs? I'm getting this
|
Yes, the first run for the first test case is without nextflow/modules/nextflow/src/main/groovy/nextflow/executor/SimpleFileCopyStrategy.groovy Line 171 in b9edc92
With that change globstar always traverses through all levels. |
acb53ee
to
2c6665e
Compare
Hi @pditommaso. As the PR is still open, I wanted to ask you how to proceed. Otherwise, we can close the PR. |
I think there are at least two issues here: 1) the correct handling of
|
|
Good point. Yes, I think the support for extended glob should be done as PR forked from #2379. Regarding the second point ideally, it should be the default behaviour i.e. case a) in your comment. Though, I don't have a clean idea of the impact of this. |
Signed-off-by: Lehmann-Fabian <[email protected]>
Ok, thanks |
This PR solves the problems with the extended glob, which can be seen in #2181. Now it passes all tests.
I promised this fix in #2035.