snakefmt complains on its own formatting on multiple if statements #157

weber8thomas · 2022-11-08T09:43:47Z

I'm experiencing an issue on a multiple conditions if statement using snakefmt

3 if statements on a single line

snakefmt works fine

New formatted code looks like this

snakefmt now complains on its own formatting

mbhall88 · 2022-11-08T22:08:05Z

Could you paste the original code in a code fence here so I can reproduce and debug this?

corneliusroemer · 2022-11-09T01:07:03Z

I can reproduce, this is how it cycles in 0.7.0 and also in 0.6.1:

Start:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"
    # Only include rules for trigger if uploading files since the trigger
    # rules depend on the outputs from upload.
    include: "workflow/snakemake_rules/trigger.smk"

After formatting:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"


# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"

But this is still not ok per Snakefmt, if I save again:

if config.get("s3_dst"):

    include: "workflow/snakemake_rules/upload.smk"
    include: "workflow/snakemake_rules/trigger.smk"
# Only include rules for trigger if uploading files since the trigger
# rules depend on the outputs from upload.
include: "workflow/snakemake_rules/trigger.smk"

@weber8thomas which version are you on?

weber8thomas · 2022-11-09T09:05:54Z

Could you paste the original code in a code fence here so I can reproduce and debug this?

@mbhall88 Sure, here is the code:

Before formatting:

if config["window"] in [50000, 100000, 200000] and config["reference"] == "hg38" and config["normalized_counts"] == True:

    rule merge_blacklist_bins:
        input:
            norm="workflow/data/normalization/HGSVC.{window}.txt",
            whitelist="workflow/data/normalization/inversion-whitelist.tsv",
        output:
            merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
        log:
            "{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
        conda:
            "../envs/mc_base.yaml"
        shell:
            """
            workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
            """

After formatting

if (
    config["window"] in [50000, 100000, 200000]
    and config["reference"] == "hg38"
    and config["normalized_counts"] == True
):

    rule merge_blacklist_bins:
        input:
            norm="workflow/data/normalization/HGSVC.{window}.txt",
            whitelist="workflow/data/normalization/inversion-whitelist.tsv",
        output:
            merged="{folder}/{sample}/normalizations/HGSVC.{window}.merged.tsv",
        log:
            "{folder}/log/normalizations/{sample}/HGSVC.{window}.merged.tsv",
        conda:
            "../envs/mc_base.yaml"
        shell:
            """
            workflow/scripts/normalization/merge-blacklist.py --merge_distance 500000 {input.norm} --whitelist {input.whitelist} --min_whitelist_interval_size 100000 > {output.merged} 2>> {log}
            """

@corneliusroemer I also experienced the same issue regarding comments block (""") or comments lines (#)

snakefmt version used: 0.6.1

mbhall88 · 2022-11-09T22:00:25Z

As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.

Thank you very much for the bug reports!

weber8thomas · 2022-11-10T07:53:57Z

As mentioned in #161, we (Brice and I) are just discussing who can address what when. I am on holidays next week and he has his PhD defence, so it will likely be after next week we get around to these sorry. #159 seems to be the first priority, but some of these other issues might be wrapped up in the solution for that bug.

Thank you very much for the bug reports!

No urgence on my side, just wanted to report it

Good luck to @bricoletc for his PhD defence and enjoy you holidays @mbhall88 !

johanneskoester · 2022-12-05T07:53:07Z

I think #159 is quite critical indeed. I have no experience with the snakefmt codebase myself, but if you give me a pointer where to start, I can try to help here.

bricoletc · 2022-12-05T10:09:50Z

Defence went well thanks @weber8thomas !

@johanneskoester this week is very busy for me. Happy to give you a pointer in about one week - and I will also have a look at this issue and see if I can fix, though more in next two weeks.

@mbhall88 @johanneskoester I'd like to start a discussion about a possible overhaul of snakefmt. Currently the parsing code relies on a lot of heuristics - especially when trying to figure out if we are inside a block of python code or not. Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility.
This issue is not the place for such a discussion btw, not sure where would be best ;)

corneliusroemer · 2022-12-05T22:19:53Z

I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.

The tests should be a useful guide along the way helping with a reimplementation.

Here are test cases for #159:

    def test_if_statement_with_snakecode_comment_snakecode_inside(self):
        """https://github.com/snakemake/snakefmt/issues/159"""
        snakecode = (
            "if True:\n\n"
            f"{TAB * 1}ruleorder: a > b\n"
            "\n"
            f"# comment\n"
            f"{TAB * 1}ruleorder: c > d\n"
        )
        formatter = setup_formatter(snakecode)
        assert formatter.get_formatted() == snakecode

    def test_if_statement_with_snakecode_2comments_snakecode_inside(self):
        """https://github.com/snakemake/snakefmt/issues/159"""
        snakecode = (
            "if True:\n\n"
            f"{TAB * 1}ruleorder: a > b\n"
            "\n"
            f"# comment\n"
            f"# comment\n"
            f"{TAB * 1}ruleorder: c > d\n"
        )
        formatter = setup_formatter(snakecode)
        assert formatter.get_formatted() == snakecode

I got the first to pass by tweaking line 66 in parser.py - but the second still fails:

            if self.vocab.recognises(keyword):
                if status.indent > self.target_indent:
-                   if self.syntax.from_python or status.pythonable:
+                   if self.from_python or status.pythonable:
                        self.from_python = True
                    else:  # Over-indented context gets reset
                        self.syntax.cur_indent = max(self.target_indent - 1, 0)
                elif self.from_python:

mbhall88 · 2022-12-06T03:58:12Z

Perhaps trying to implement an AST-based parser/formatter would be much easier for maintanibility.
This issue is not the place for such a discussion btw, not sure where would be best ;)

Agreed. This is what I wanted to do in the first place 😛

But that would require a significant amount of work that I don't have and I doubt @bricoletc does either. Maybe @johanneskoester has someone in his team that could do this?

Once I get #163 and #164 fixed, I will look at this. It is quite possible #164 could impact this issue

corneliusroemer · 2022-12-06T04:21:13Z

The challenge is one would need a CST instead of an AST, as the AST throws away things like comments.

So if IUC, one would need to:
a) write a Snakemake grammar extension of https://docs.python.org/3/library/ast.html
b) Get an AST based on that grammar
c) Then implement rewrites based on that, maybe with something like https://github.com/asottile/tokenize-rt

Does that sound right @mbhall88?

bricoletc · 2022-12-06T07:32:42Z

I had a go at trying to debug this stuff myself over the weekend but gave up after half a dozen hours. I managed to get a simple new test case to pass, but then when I used two comment lines rather than one things failed again.

A valiant effort @corneliusroemer !

I'd be up to having a go at a CST/AST-based rewrite, if others, like @corneliusroemer /someone from @johanneskoester 's team can join in.

(And @mbhall88 as ever if you want to - I know you were talking about ASTs when we first started 😅)

johanneskoester · 2022-12-06T13:40:09Z

The rewrite plans sound good, but also as if that will need a lot of work. So, if we can fix this before, I think that would be good.
I might be able to dedicate some person hours to something like that later in 2023, but as there are many other construction sites, it would be highly appreciated if you would want to start before.

mbhall88 · 2022-12-19T23:22:12Z

The examples in this issue are resolved in v0.8.0

corneliusroemer mentioned this issue Nov 9, 2022

Serious bug: Code after comment in indented if-block is unindented, changing execution #159

Closed

bricoletc mentioned this issue Dec 13, 2022

fix: Indent overhaul #168

Merged

bricoletc added a commit that referenced this issue Dec 15, 2022

Add failing test from #157

2599e28

bricoletc added a commit that referenced this issue Dec 15, 2022

Fix test from #157

b0bed15

mbhall88 closed this as completed Dec 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

snakefmt complains on its own formatting on multiple if statements #157

snakefmt complains on its own formatting on multiple if statements #157

weber8thomas commented Nov 8, 2022

mbhall88 commented Nov 8, 2022

corneliusroemer commented Nov 9, 2022 •

edited

Loading

weber8thomas commented Nov 9, 2022 •

edited

Loading

mbhall88 commented Nov 9, 2022 •

edited

Loading

weber8thomas commented Nov 10, 2022

johanneskoester commented Dec 5, 2022

bricoletc commented Dec 5, 2022 •

edited

Loading

corneliusroemer commented Dec 5, 2022 •

edited

Loading

mbhall88 commented Dec 6, 2022

corneliusroemer commented Dec 6, 2022

bricoletc commented Dec 6, 2022 •

edited

Loading

johanneskoester commented Dec 6, 2022

mbhall88 commented Dec 19, 2022

snakefmt complains on its own formatting on multiple if statements #157

snakefmt complains on its own formatting on multiple if statements #157

Comments

weber8thomas commented Nov 8, 2022

3 if statements on a single line

snakefmt works fine

New formatted code looks like this

snakefmt now complains on its own formatting

mbhall88 commented Nov 8, 2022

corneliusroemer commented Nov 9, 2022 • edited Loading

weber8thomas commented Nov 9, 2022 • edited Loading

mbhall88 commented Nov 9, 2022 • edited Loading

weber8thomas commented Nov 10, 2022

johanneskoester commented Dec 5, 2022

bricoletc commented Dec 5, 2022 • edited Loading

corneliusroemer commented Dec 5, 2022 • edited Loading

mbhall88 commented Dec 6, 2022

corneliusroemer commented Dec 6, 2022

bricoletc commented Dec 6, 2022 • edited Loading

johanneskoester commented Dec 6, 2022

mbhall88 commented Dec 19, 2022

corneliusroemer commented Nov 9, 2022 •

edited

Loading

weber8thomas commented Nov 9, 2022 •

edited

Loading

mbhall88 commented Nov 9, 2022 •

edited

Loading

bricoletc commented Dec 5, 2022 •

edited

Loading

corneliusroemer commented Dec 5, 2022 •

edited

Loading

bricoletc commented Dec 6, 2022 •

edited

Loading