Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intelligent whitespace interleaving (WIP) #1876

Merged
merged 18 commits into from
May 2, 2018

Conversation

IwanKaramazow
Copy link
Contributor

@IwanKaramazow IwanKaramazow commented Mar 17, 2018

Intelligent Whitespace interleaving

What does this mean?

Whitespace inserted by the user will be tracked and preserved by refmt in certain places.
Amount of newlines interleaved (or lack thereof) is now deducted from whitespace
interleaved in the source instead of hard coded defaults in the printer.
More than one newline will be collapsed into one newline.

Example:

open Multicore;
open React; /* no newline above */

/* newline above */
let add = (a, b) => {
  let a = 1;

  let b = 1; /* the newline above this let binding is preserved */
  a + b;
};

/* another comment */
/* attach this comment to the above comment */

/* but here we want a newline above */

Where is whitespace inserted by the user preserved?

  • structure items (both top-level in a .re file & inside module X = { ... })
  • signature items (both top-level in a .rei file & inside module type Y = { ... })
  • inside let-bindings

How does it work?

Preserving whitespace can be split into two parts:

  1. Computation of inserted whitespace (whitespace isn't part of the AST)
  2. Formatting of the computed whitespace into the layout tree (without disrupting comment insertion)

Computation of whitespace

The Ocaml-AST doesn't contain whitespace. This isn't a problem, since whitespace
between two items can easily be computed based on the location diff between two items.

Imagine the following source code:

1| let a = 1;
2|
3| let b = 2;

We can easily determine if there's whitespace between let a and let b,
by substracting the linenumber of the start of let b by the linenumber of the end
of let a. If the diff is equal to or greater than 1, we know that there's whitespace.

let whitespace = loc2.loc_start.pos_lnum - loc1.loc_end.pos_lnum >= 1

In practise the computation is a little bit more complex because of comments.

1| let a = 1;
2| /* comment on line 2 */
3| let b = 1;

Here the location diff indicates whitespace, but the comment fills up the space between
the two items. The correct diff is:

let whitespace = loc2.loc_start.pos_lnum - loc1.loc_end.pos_lnum - height_of_comments_between >= 1

Formatting computed whitespace (without disrupting comment insertion)

Once we know we need to interleave whitespace between two items, we can't just
print an atom "\n" node. This would clash with the comment interleaving
algorithm. Without inspecting every atom for possible \n text, we would get
the following scenario

before refmt        after refmt
------------       ------------
let a = 1;      |  let a = 1;
                |  /* comment */ --> attached above `let b = 2;`
/* comment */   |               --> atom "\n", was formatted before 
let b = 2;      |  let b = 2;

To circumvent this problem, whitespace needs to be interleaved after all
comments have been inserted into the layout.
In order to postpone the insertion of whitespace to the last moment,
the following layout made its way into existence:

  type layout =
      ...
    | Whitespace of WhitespaceRegion.t * t

Whitespace is a specific type of layout. It conveys an
"intent to interleave whitespace". It decorates a layout with information
for whitespace insertion above that node.
WhitespaceRegion.t contains specific info
about the region where whitespace should be interleaved.
The info is used for correct comment interleaving inside of the region
designated for whitespace.

Perfomance implications

To compute if a range in the source code contains whitespace,
we need the comments for that range. Every time the whitespace computation
happens, we need to traverse all comments to find the comments that overlap.
The traversal of comments can be improved by storing the comments inside a
interval-(Red/Black or AVL)tree for cheaper lookup.

As a benchmark reference: there's a slowdown of around 50ms with this diff,
on a 10000 line file. Reason_pprint_ast.ml (formatted in Reason) was taken
as reference point on a MacBook Pro 2015 i5.

The comment interleaving algorithm in general is very slow, for every comment
we create a create/travers the/a whole new tree.
There is definitely room for improvement here. The performance will be adressed
in a separate PR.

@Schmavery
Copy link
Contributor

This is awesome! Are you considering doing the break-if-over-multiple-lines thing you mentioned here in the future as well? I remember thinking that was a great idea.

@IwanKaramazow IwanKaramazow force-pushed the IntelligentWhitespace branch 6 times, most recently from e919681 to 72b1a83 Compare March 31, 2018 17:26
@IwanKaramazow
Copy link
Contributor Author

IwanKaramazow commented Mar 31, 2018

@Schmavery Hey, sorry for the late response. Managed to implement the "intelligent" whitespace interleaving inside modules, i.e. more than one newline between "items" will collapse into one newline that will be preserved. Since whitespace is a tool for us programmers to organise our code, I don't think we should let an algorithm/heuristic interfere with our human newlines atm. That might change in the future, but I think it's the best way forward atm. I like to error on the side of not producing ugly formatted code at this stage 😄

Currently finished interleaving in modules:
captura de pantalla 2018-03-17 a las 21 41 04

@IwanKaramazow IwanKaramazow force-pushed the IntelligentWhitespace branch from 8f2456f to 24d5f2d Compare April 14, 2018 04:34
@IwanKaramazow IwanKaramazow force-pushed the IntelligentWhitespace branch from 24d5f2d to 70c2a28 Compare April 14, 2018 07:49
@IwanKaramazow IwanKaramazow force-pushed the IntelligentWhitespace branch from 70c2a28 to be2050a Compare April 14, 2018 08:15
@IwanKaramazow IwanKaramazow force-pushed the IntelligentWhitespace branch from 99fdd2b to 6e6a207 Compare April 20, 2018 19:43
@IwanKaramazow IwanKaramazow force-pushed the IntelligentWhitespace branch from 8489b46 to 31340f1 Compare April 29, 2018 13:39
@IwanKaramazow
Copy link
Contributor Author

@chenglou @jordwalke ready for review

@chenglou chenglou merged commit ee49467 into reasonml:master May 2, 2018
@chenglou
Copy link
Member

chenglou commented May 2, 2018

Tada!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants