Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving module information of input and output and EDAM ontology from bio.tools #3418

Merged
merged 27 commits into from
Feb 13, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
f14d145
first implementation of retrieving module information of input and ou…
mirpedrol Jan 21, 2025
8f3d2dc
first implementation of retrieving module information of input and ou…
mirpedrol Jan 21, 2025
56f37f8
run pre-commit
mirpedrol Jan 28, 2025
91c67f1
run prettier after creating the module
mirpedrol Jan 28, 2025
38e7784
fix pytests
mirpedrol Jan 28, 2025
3edcecc
fix typing
mirpedrol Jan 29, 2025
135f5da
more typing fixes
mirpedrol Jan 29, 2025
284bfa6
add pytests for components utils
mirpedrol Jan 29, 2025
fbe857e
restructure components tests
mirpedrol Jan 29, 2025
27d1996
Apply suggestions from code review
mirpedrol Jan 30, 2025
bde7a3b
use bio.tools info to create main.nf for modules template
mirpedrol Feb 4, 2025
d2631b1
add EDAM comments and suggestions from review
mirpedrol Feb 7, 2025
7bae22e
add missing ontologies to meta.yml when using nf-core lint --fix
mirpedrol Feb 10, 2025
1735c9e
use pattern for output channel element in meta.yml
mirpedrol Feb 10, 2025
d730b08
update pytest
mirpedrol Feb 10, 2025
5cef9f6
remove 'type' assignment for backwards python compatibility
mirpedrol Feb 10, 2025
f4a42b5
Merge branch 'dev' into get-edam-ontology
mirpedrol Feb 10, 2025
76e56c5
try ignoring errors when deleting a modules work directory created wi…
mirpedrol Feb 10, 2025
6f442c2
more tryes to fix pytest
mirpedrol Feb 10, 2025
5301e1b
run test_unstable_snapshot first and don't remove_readonly
mirpedrol Feb 10, 2025
2ed1eb9
use EDAM tsv instead of python library
mirpedrol Feb 10, 2025
8a6d45f
add edam comments to hardocded template example and when linting
mirpedrol Feb 10, 2025
ca93755
don't add empty comment and comma to meta.yml
mirpedrol Feb 10, 2025
2795d58
add log.info messages
mirpedrol Feb 10, 2025
3dead1e
fix nf-test components test
mirpedrol Feb 10, 2025
47036ff
fix more pytests
mirpedrol Feb 11, 2025
1e574fe
fix test path
mirpedrol Feb 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ jobs:
name: Run ${{matrix.test}} with Python ${{ needs.setup.outputs.python-version }} on ${{ needs.setup.outputs.runner }}
needs: [setup, list_tests]
if: ${{ needs.setup.outputs.run-tests }}
# run on self-hosted runners for test_components.py (because of the gitlab branch), based on the input if it is dispatched manually, on github if it is a rerun or on self-hosted by default
runs-on: ${{ matrix.test == 'test_components.py' && 'self-hosted' || (github.event.inputs.runners || github.run_number > 1 && 'ubuntu-latest' || 'self-hosted') }}
# run on self-hosted runners for test_components_generate_snapshot.py (because of the gitlab branch), based on the input if it is dispatched manually, on github if it is a rerun or on self-hosted by default
runs-on: ${{ matrix.test == 'components/test_components_generate_snapshot.py' && 'self-hosted' || (github.event.inputs.runners || github.run_number > 1 && 'ubuntu-latest' || 'self-hosted') }}
strategy:
matrix: ${{ fromJson(needs.list_tests.outputs.tests) }}
fail-fast: false # run all tests even if one fails
Expand Down
84 changes: 71 additions & 13 deletions nf_core/components/components_utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import logging
import re
from pathlib import Path
from typing import TYPE_CHECKING, List, Optional, Tuple, Union
from typing import TYPE_CHECKING, Dict, List, Optional, Tuple, Union

import questionary
import requests
Expand Down Expand Up @@ -165,9 +165,9 @@ def get_components_to_install(subworkflow_dir: Union[str, Path]) -> Tuple[List[s
return modules, subworkflows


def get_biotools_id(tool_name) -> str:
def get_biotools_response(tool_name: str) -> Optional[Dict]:
"""
Try to find a bio.tools ID for 'tool'
Try to get bio.tools information for 'tool'
"""
url = f"https://bio.tools/api/t/?q={tool_name}&format=json"
try:
Expand All @@ -176,16 +176,74 @@ def get_biotools_id(tool_name) -> str:
response.raise_for_status() # Raise an error for bad status codes
# Parse the JSON response
data = response.json()
log.info(f"Found bio.tools information for '{tool_name}'")
return data

# Iterate through the tools in the response to find the tool name
for tool in data["list"]:
if tool["name"].lower() == tool_name:
return tool["biotoolsCURIE"]
except requests.exceptions.RequestException as e:
log.warning(f"Could not find bio.tools information for '{tool_name}': {e}")
return None

# If the tool name was not found in the response
log.warning(f"Could not find a bio.tools ID for '{tool_name}'")
return ""

except requests.exceptions.RequestException as e:
log.warning(f"Could not find a bio.tools ID for '{tool_name}': {e}")
return ""
def get_biotools_id(data: dict, tool_name: str) -> str:
"""
Try to find a bio.tools ID for 'tool'
"""
# Iterate through the tools in the response to find the tool name
for tool in data["list"]:
if tool["name"].lower() == tool_name:
log.info(f"Found bio.tools ID: '{tool['biotoolsCURIE']}'")
return tool["biotoolsCURIE"]

# If the tool name was not found in the response
log.warning(f"Could not find a bio.tools ID for '{tool_name}'")
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if it isn't better to always return None if we don't get a correct entry, like we do in the function above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had another look and since this is the value we use for the meta.yml it is easier to return an empty string that we can directly add to the file. That way, we avoid having to check if it's None.



DictWithStrAndTuple = Dict[str, Tuple[List[str], List[str], List[str]]]


def get_channel_info_from_biotools(
data: dict, tool_name: str
) -> Optional[Tuple[DictWithStrAndTuple, DictWithStrAndTuple]]:
"""
Try to find input and output channels and the respective EDAM ontology terms

Args:
data (dict): The bio.tools API response
tool_name (str): The name of the tool
"""
inputs = {}
outputs = {}

def _iterate_input_output(type) -> DictWithStrAndTuple:
type_info = {}
if type in funct:
for element in funct[type]:
if "data" in element:
element_name = "_".join(element["data"]["term"].lower().split(" "))
uris = [element["data"]["uri"]]
terms = [element["data"]["term"]]
patterns = []
if "format" in element:
for format in element["format"]:
# Append the EDAM URI
uris.append(format["uri"])
# Append the EDAM term, getting the first word in case of complicated strings. i.e. "FASTA format"
patterns.append(format["term"].lower().split(" ")[0])
terms.append(format["term"])
type_info[element_name] = (uris, terms, patterns)
return type_info

# Iterate through the tools in the response to find the tool name
for tool in data["list"]:
if tool["name"].lower() == tool_name:
if "function" in tool:
# Parse all tool functions
for funct in tool["function"]:
inputs.update(_iterate_input_output("input"))
outputs.update(_iterate_input_output("output"))
return inputs, outputs

# If the tool name was not found in the response
log.warning(f"Could not find an EDAM ontology term for '{tool_name}'")
return None
13 changes: 11 additions & 2 deletions nf_core/components/create.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
import nf_core
import nf_core.utils
from nf_core.components.components_command import ComponentCommand
from nf_core.components.components_utils import get_biotools_id
from nf_core.components.components_utils import get_biotools_id, get_biotools_response, get_channel_info_from_biotools
from nf_core.pipelines.lint_utils import run_prettier_on_file

log = logging.getLogger(__name__)
Expand Down Expand Up @@ -151,8 +151,15 @@ def create(self) -> bool:
if self.component_type == "modules":
# Try to find a bioconda package for 'component'
self._get_bioconda_tool()
name = self.tool_conda_name if self.tool_conda_name else self.component
# Try to find a biotools entry for 'component'
self.tool_identifier = get_biotools_id(self.component)
biotools_data = get_biotools_response(name)
if biotools_data:
self.tool_identifier = get_biotools_id(biotools_data, name)
# Obtain EDAM ontologies for inputs and outputs
channel_info = get_channel_info_from_biotools(biotools_data, name)
if channel_info:
self.inputs, self.outputs = channel_info

# Prompt for GitHub username
self._get_username()
Expand All @@ -176,6 +183,8 @@ def create(self) -> bool:

new_files = [str(path) for path in self.file_paths.values()]

run_prettier_on_file(new_files)

log.info("Created following files:\n " + "\n ".join(new_files))
return True

Expand Down
3 changes: 3 additions & 0 deletions nf_core/module-template/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,7 @@ channels:
- conda-forge
- bioconda
dependencies:
# TODO nf-core: List required Conda package(s).
# Software MUST be pinned to channel (i.e. "bioconda"), version (i.e. "1.10").
# For Conda, the build (i.e. "h9402c20_2") must be EXCLUDED to support installation on different operating systems.
- "{{ bioconda if bioconda else 'YOUR-TOOL-HERE' }}"
51 changes: 41 additions & 10 deletions nf_core/module-template/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,6 @@ process {{ component_name_underscore|upper }} {
label '{{ process_label }}'

{% if not_empty_template -%}
// TODO nf-core: List required Conda package(s).
// Software MUST be pinned to channel (i.e. "bioconda"), version (i.e. "1.10").
// For Conda, the build (i.e. "h9402c20_2") must be EXCLUDED to support installation on different operating systems.
// TODO nf-core: See section in main README for further information regarding finding and adding container addresses to the section below.
{% endif -%}
conda "${moduleDir}/environment.yml"
Expand All @@ -33,6 +30,12 @@ process {{ component_name_underscore|upper }} {
'{{ docker_container if docker_container else 'biocontainers/YOUR-TOOL-HERE' }}' }"

input:
{%- if inputs %}
// TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
{%- for input_name, ontologies in inputs.items() %}
{{ 'tuple val(meta), path(' + input_name + ')' if has_meta else 'path ' + input_name }}
{%- endfor %}
{%- else -%}
{% if not_empty_template -%}
// TODO nf-core: Where applicable all sample-specific information e.g. "id", "single_end", "read_group"
// MUST be provided as an input via a Groovy Map called "meta".
Expand All @@ -44,16 +47,22 @@ process {{ component_name_underscore|upper }} {
{%- else -%}
{{ 'tuple val(meta), path(input)' if has_meta else 'path input' }}
{%- endif %}
{%- endif %}

output:
{%- if outputs %}
// TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
{%- for output_name, ontologies in outputs.items() %}
{{ 'tuple val(meta), path("*.{' + ontologies[2]|join(',') + '}")' if has_meta else 'path ' + output_name }}, emit: {{ output_name }}
{%- endfor %}
{%- else %}
{% if not_empty_template -%}
// TODO nf-core: Named file extensions MUST be emitted for ALL output channels
{{ 'tuple val(meta), path("*.bam")' if has_meta else 'path "*.bam"' }}, emit: bam
// TODO nf-core: List additional required output channels/values here
{%- else -%}
{{ 'tuple val(meta), path("*")' if has_meta else 'path "*"' }}, emit: output
{%- endif %}
{% if not_empty_template -%}
// TODO nf-core: List additional required output channels/values here
{%- endif %}
path "versions.yml" , emit: versions

Expand All @@ -78,20 +87,33 @@ process {{ component_name_underscore|upper }} {
{%- endif %}
"""
{% if not_empty_template -%}
samtools \\
sort \\
{{ component }} \\
$args \\
-@ $task.cpus \\
{%- if has_meta %}
{%- if inputs %}
{%- for input_name, ontologies in inputs.items() %}
{%- set extensions = ontologies[2] %}
{%- for ext in extensions %}
-o ${prefix}.{{ ext }} \\
{%- endfor %}
{%- endfor %}
{%- else %}
-o ${prefix}.bam \\
-T $prefix \\
{%- endif %}
{%- endif %}
{%- if inputs %}
{%- for input_name, ontologies in inputs.items() %}
${{ input_name }} \\
{%- endfor %}
{%- else %}
$bam
{%- endif %}
{%- endif %}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
{{ component }}: \$(samtools --version |& sed '1!d ; s/samtools //')
{{ component }}: \$({{ component }} --version)
END_VERSIONS
"""

Expand All @@ -108,12 +130,21 @@ process {{ component_name_underscore|upper }} {
{%- endif %}
"""
{% if not_empty_template -%}
{%- if inputs %}
{%- for input_name, ontologies in inputs.items() %}
{%- set extensions = ontologies[2] %}
{%- for ext in extensions %}
touch ${prefix}.{{ ext }}
{%- endfor %}
{%- endfor %}
{%- else %}
touch ${prefix}.bam
{%- endif %}
{%- endif %}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
{{ component }}: \$(samtools --version |& sed '1!d ; s/samtools //')
{{ component }}: \$({{ component }} --version)
END_VERSIONS
"""
}
61 changes: 52 additions & 9 deletions nf_core/module-template/meta.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,32 @@ tools:
## TODO nf-core: Add a description of all of the variables used as input
{% endif -%}
input:
{% if inputs -%}
{% for input_name, ontologies in inputs.items() -%}
{% if has_meta %}
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1' ]`
{% endif %}
- {{ input_name }}:
# TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
type: file
description: {{ input_name }} file
pattern: {{ "\"*.{" + ontologies[2]|join(",") + "}\"" }}
ontologies:
{% for ontology in ontologies[0] -%}
- edam: "{{ ontology }}" # {{ ontologies[1][loop.index0] }}
{% endfor -%}
{% endfor -%}
{% else -%}
#{% if has_meta %} Only when we have meta
- - meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`
e.g. `[ id:'sample1' ]`
{% endif %}
{% if not_empty_template -%}
## TODO nf-core: Delete / customise this example input
Expand All @@ -42,24 +62,46 @@ input:
pattern: {{ '"*.{bam,cram,sam}"' if not_empty_template else "" }}
ontologies:
{% if not_empty_template -%}
- edam: "http://edamontology.org/format_25722"
- edam: "http://edamontology.org/format_2573"
- edam: "http://edamontology.org/format_3462"
{% else %}
- edam: "http://edamontology.org/format_25722" # BAM
- edam: "http://edamontology.org/format_2573" # CRAM
- edam: "http://edamontology.org/format_3462" # SAM
{% else -%}
- edam: ""
{%- endif %}
{%- endif %}

{% if not_empty_template -%}
## TODO nf-core: Add a description of all of the variables used as output
{% endif -%}
output:
{% if outputs -%}
{% for output_name, ontologies in outputs.items() -%}
- {{ output_name }}:
{% if has_meta -%}
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1' ]`
{%- endif %}
- {{ "\"*.{" + ontologies[2]|join(",") + "}\"" }}:
# TODO nf-core: Update the information obtained from bio.tools and make sure that it is correct
type: file
description: {{ output_name }} file
pattern: {{ "\"*.{" + ontologies[2]|join(",") + "}\"" }}
ontologies:
{%- for ontology in ontologies[0] %}
- edam: "{{ ontology }}" # {{ ontologies[1][loop.index0] }}
{%- endfor %}
{% endfor -%}
{% else -%}
- {{ 'bam:' if not_empty_template else "output:" }}
#{% if has_meta -%} Only when we have meta
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'sample1', single_end:false ]`
e.g. `[ id:'sample1' ]`
{%- endif %}
{% if not_empty_template -%}
## TODO nf-core: Delete / customise this example output
Expand All @@ -70,12 +112,13 @@ output:
pattern: {{ '"*.{bam,cram,sam}"' if not_empty_template else "" }}
ontologies:
{% if not_empty_template -%}
- edam: "http://edamontology.org/format_25722"
- edam: "http://edamontology.org/format_2573"
- edam: "http://edamontology.org/format_3462"
- edam: "http://edamontology.org/format_25722" # BAM
- edam: "http://edamontology.org/format_2573" # CRAM
- edam: "http://edamontology.org/format_3462" # SAM
{% else -%}
- edam: ""
{%- endif %}
{%- endif %}
- versions:
- "versions.yml":
type: file
Expand Down
Loading
Loading