Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

snakefmt complains on its own formatting on multiple if statements #157

Closed
weber8thomas opened this issue Nov 8, 2022 · 13 comments
Closed

Comments

@weber8thomas
Copy link

I'm experiencing an issue on a multiple conditions if statement using snakefmt

3 if statements on a single line

Screenshot 2022-11-08 at 10 37 41

snakefmt works fine

Screenshot 2022-11-08 at 10 38 23

New formatted code looks like this

Screenshot 2022-11-08 at 10 38 54

snakefmt now complains on its own formatting

Screenshot 2022-11-08 at 10 39 14

@mbhall88
Copy link
Member

mbhall88 commented Nov 8, 2022

Could you paste the original code in a code fence here so I can reproduce and debug this?

@corneliusroemer
Copy link

corneliusroemer commented Nov 9, 2022

I can reproduce, this is how it cycles in 0.7.0 and also in 0.6.1:

Start:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"
    # Only include rules for trigger if uploading files since the trigger
    # rules depend on the outputs from upload.
    include: "workflow/snakemake_rules/trigger.smk"

After formatting:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"


# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"

But this is still not ok per Snakefmt, if I save again:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"
# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"

@weber8thomas which version are you on?

@weber8thomas
Copy link
Author

weber8thomas commented Nov 9, 2022

Could you paste the original code in a code fence here so I can reproduce and debug this?

@mbhall88 Sure, here is the code:

Before formatting:

if config["window"] in [50000, 100000, 200000] and config["reference"] == "hg38" and config["normalized_counts"] == True:

    rule merge_blacklist_bins:
        input:
            norm="workflow/data/normalization/HGSVC.{window}.txt",
            whitelist="workflow/data/normalization/inversion-whitelist.tsv",
        output:
            merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
        log:
            "{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
        conda:
            "../envs/mc_base.yaml"
        shell:
            """
            workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
            """

After formatting

if (
    config["window"] in [50000, 100000, 200000]
    and config["reference"] == "hg38"
    and config["normalized_counts"] == True
):

    rule merge_blacklist_bins:
        input:
            norm="workflow/data/normalization/HGSVC.{window}.txt",
            whitelist="workflow/data/normalization/inversion-whitelist.tsv",
        output:
            merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
        log:
            "{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
        conda:
            "../envs/mc_base.yaml"
        shell:
            """
            workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
            """

@corneliusroemer I also experienced the same issue regarding comments block (""") or comments lines (#)

snakefmt version used: 0.6.1

@mbhall88
Copy link
Member

mbhall88 commented Nov 9, 2022

As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.

Thank you very much for the bug reports!

@weber8thomas
Copy link
Author

As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.

Thank you very much for the bug reports!

No urgence on my side, just wanted to report it

Good luck to @bricoletc for his PhD defence and enjoy you holidays @mbhall88 !

@johanneskoester
Copy link
Contributor

I think #159 is quite critical indeed. I have no experience with the snakefmt codebase myself, but if you give me a pointer where to start, I can try to help here.

@bricoletc
Copy link
Collaborator

bricoletc commented Dec 5, 2022

Defence went well thanks @weber8thomas !

@johanneskoester this week is very busy for me. Happy to give you a pointer in about one week - and I will also have a look at this issue and see if I can fix, though more in next two weeks.

@mbhall88 @johanneskoester I'd like to start a discussion about a possible overhaul of snakefmt. Currently the parsing code relies on a lot of heuristics - especially when trying to figure out if we are inside a block of python code or not. Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility.
This issue is not the place for such a discussion btw, not sure where would be best ;)

@corneliusroemer
Copy link

corneliusroemer commented Dec 5, 2022

I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.

The tests should be a useful guide along the way helping with a reimplementation.

Here are test cases for #159:

    def test_if_statement_with_snakecode_comment_snakecode_inside(self):
        """https://github.com/snakemake/snakefmt/issues/159"""
        snakecode = (
            "if True:\n\n"
            f"{TAB * 1}ruleorder: a > b\n"
            "\n"
            f"# comment\n"
            f"{TAB * 1}ruleorder: c > d\n"
        )
        formatter = setup_formatter(snakecode)
        assert formatter.get_formatted() == snakecode

    def test_if_statement_with_snakecode_2comments_snakecode_inside(self):
        """https://github.com/snakemake/snakefmt/issues/159"""
        snakecode = (
            "if True:\n\n"
            f"{TAB * 1}ruleorder: a > b\n"
            "\n"
            f"# comment\n"
            f"# comment\n"
            f"{TAB * 1}ruleorder: c > d\n"
        )
        formatter = setup_formatter(snakecode)
        assert formatter.get_formatted() == snakecode

I got the first to pass by tweaking line 66 in parser.py - but the second still fails:

            if self.vocab.recognises(keyword):
                if status.indent > self.target_indent:
-                   if self.syntax.from_python or status.pythonable:
+                   if self.from_python or status.pythonable:
                        self.from_python = True
                    else:  # Over-indented context gets reset
                        self.syntax.cur_indent = max(self.target_indent - 1, 0)
                elif self.from_python:

@mbhall88
Copy link
Member

mbhall88 commented Dec 6, 2022

Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility.
This issue is not the place for such a discussion btw, not sure where would be best ;)

Agreed. This is what I wanted to do in the first place 😛

But that would require a significant amount of work that I don't have and I doubt @bricoletc does either. Maybe @johanneskoester has someone in his team that could do this?

Once I get #163 and #164 fixed, I will look at this. It is quite possible #164 could impact this issue

@corneliusroemer
Copy link

The challenge is one would need a CST instead of an AST, as the AST throws away things like comments.

So if IUC, one would need to:
a) write a Snakemake grammar extension of https://docs.python.org/3/library/ast.html
b) Get an AST based on that grammar
c) Then implement rewrites based on that, maybe with something like https://github.com/asottile/tokenize-rt

Does that sound right @mbhall88?

@bricoletc
Copy link
Collaborator

bricoletc commented Dec 6, 2022

I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.

A valiant effort @corneliusroemer !

I'd be up to having a go at a CST/AST-based rewrite, if others, like @corneliusroemer /someone from @johanneskoester 's team can join in.

(And @mbhall88 as ever if you want to - I know you were talking about ASTs when we first started 😅)

@johanneskoester
Copy link
Contributor

The rewrite plans sound good, but also as if that will need a lot of work. So, if we can fix this before, I think that would be good.
I might be able to dedicate some person hours to something like that later in 2023, but as there are many other construction sites, it would be highly appreciated if you would want to start before.

bricoletc added a commit that referenced this issue Dec 15, 2022
bricoletc added a commit that referenced this issue Dec 15, 2022
@mbhall88
Copy link
Member

The examples in this issue are resolved in v0.8.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants