Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect alignment #258

Open
1 of 4 tasks
krlmlr opened this issue Oct 25, 2017 · 6 comments
Open
1 of 4 tasks

Detect alignment #258

krlmlr opened this issue Oct 25, 2017 · 6 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Oct 25, 2017

UPDATE: Alignment detection was implemented for function calls: https://styler.r-lib.org/articles/detect-alignment.html

Rough outline:

  • Look at the horizontal position of the corresponding items in the original code
  • If the majority is aligned and the alignment involves >1 space somewhere, we store this somewhere in the nested parse data
  • When rendering, the alignment is restored
@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Jan 6, 2018

Related: #317

@lorenzwalthert
Copy link
Collaborator

May want to check overlap with https://github.com/seasmith/AlignAssign and learn from it.

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented May 11, 2019

I think we should take a different path. I see two important cases now:

Multi-line function calls with named arguments

They should be classified as aligned if argument values are left-aligned, argument values are right-aligned and commas match position.

call(
  x   = 1, kdd  =  2,
  xy  = 2, n    = 33,
)

We first detect if the arguments are aligned. This is the case when they match the above layout. We can implement this as follows:

  • argument name and = have the same line1 (i.e. the start of a token) for multiple lines for the first column (justified below).
  • spacing around comma is correct (none before, at least one after).
  • and at least one space around =.

These checks, however, would classify the following as aligned

call(
  x   = 1, kdd  =  2,
  xy  = 22, n    = 33,
)

Because the argument values are not in the same nest as the argument name and =, checking their value is expensive, as it is potentially again a complicated expression, not just a terminal token, which would require constructing text recursively to know its length. Hence, for a first pass, we can assume that a call is aligned if the argument name and = are aligned plus the other criteria. Later, we could check for that and if it holds (which will not be the case for 90% of the cases we look at now), we can do the expensive checks and hence not imposing any slowdown for 90% of the cases.

Multi-line function calls with unnamed arguments

They should be classified as aligned if the first column is left-aligned (to respect indention rules), the remaining columns are right-aligned and all commas but those in the first column match position.

call(
  1111, 2,
  2,   33,
)

This would involve a similar procedure, namely:

  • first column has same number of lag spaces.
  • spacing around comma is correct (none before, > 0 after).
  • commas from all columns but the first one are aligned (needs construction of text because of nested expressions).

Mixed function calls

call(
  x = 2, 4, c  =      f(a, b),
  1,    44, de = ffk(a, b, c)
)

Hence, in the most general form:

  • lag spaces of column 1 must agree.
  • spacing around comma (0 before, > 1 after) and spacing around = (at least one around).
  • all positions of commas of col > 2 must agree (needs recursive creation of `text?).

Checks must be performed for any multi-line function call (with or without =), which is quite expensive, in particular, the recursive creation of text. I think as we already group by line, we should greedily check column by column and within that, we should check one line at the time and fail fast. E.g. we don't need to check line 3, column 1 if we know that line 1 and 2 are not aligned for column 1.

call(1,
  another = 3,
  text = f(a, b = 33, x == 3)
)

The mixed case is the general case and hence, the pure other cases were only developed here because we can resolve them in a unified manner (a one-stop solution).

It is not clear what happens with multi-line calls that have multi-line calls within them, but arguably the rules should not depend on that:

call(
  first, second___ = call(
    x = 3, a = f(g), ggh
  ), third_row_of_first_column, 
  col1., col2_l    = 9
  )

Assignment

a  <- 3
ab <- 3

This is a case I have not looked at because it was not so common. We can have a look later. What do you think? cc: @pat-s

Comments

Not looked at

ab # here
a  # here

@pat-s
Copy link
Contributor

pat-s commented May 12, 2019

Sounds good. Would fit good into the "strict" idea of styler then. There might be even more problems to tackle for R6 notation. Supporting custom alignment of dictionaries is an important one and a good one to start with :) (I guess it applies more often to R6 than to S3)

@lorenzwalthert
Copy link
Collaborator

@lorenzwalthert
Copy link
Collaborator

lorenzwalthert commented Oct 13, 2019

@pat-s @krlmlr are you interested in reviewing the vignette on alignment detection? https://styler.r-lib.org/articles/detect-alignment.html? It was hard for me to write it such that it's easy to understand, yet most of it should be pretty intuitive. But I feel the text is complicated. It would be nice if we could get that right before styler v1.2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants