New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Blog post: leaving the Sea of Nodes #797

Merged

LeszekSwirski merged 2 commits into main from sea-of-nodes-post

Mar 25, 2025

Contributor

DadaIsCrazy commented Mar 25, 2025

No description provided.

DadaIsCrazy added 2 commits

March 25, 2025 11:32


          Blog post: leaving the Sea of Nodes

d5f53f1


          Fix linting errors

985b27f

LeszekSwirski approved these changes

View reviewed changes

LeszekSwirski merged commit 1064a43 into main

4 checks passed

LeszekSwirski deleted the sea-of-nodes-post branch

March 25, 2025 13:45

Liedtke reviewed

View reviewed changes

Member

Liedtke left a comment

Super cool, all nits, so up to you if you want to address them or just keep as is. :)

src/_img/leaving-the-sea-of-nodes/Sea-of-Nodes-string-add.pdf Outdated

Member

Liedtke Mar 25, 2025

Why do we need a PDF version of this given that the png seems to be the same?

src/blog/leaving-the-sea-of-nodes.md

+              In this example, without control edges, nothing would prevent the `return`s from being executed before the `branch`, which would obviously be wrong.
+              The crucial thing here is that the control edges only impose an order of the operations that have such incoming or outgoing edges, but not on other operations such as the arithmetic operations. This is the main difference between Sea of Nodes and Control flow graphs.
+              Let’s now add effectful operations (eg, loads and stores from and to memory) in the mix. Similarly to control nodes, effectful operations often have no value dependencies, but still cannot run in a random order. For instance, `a[0] += 42; x = a[0]` and `x = a[0]; a[0] += 42` are not equivalent. So, we need a way to impose an order (= a schedule) on effectful operations.  We could reuse the control chain for this purpose, but this would be stricter than required. For instance, consider this small snippet:

Member

Liedtke Mar 25, 2025

Oh no, two whitespaces!

Suggested change

      
            Let’s now add effectful operations (eg, loads and stores from and to memory) in the mix. Similarly to control nodes, effectful operations often have no value dependencies, but still cannot run in a random order. For instance, `a[0] += 42; x = a[0]` and `x = a[0]; a[0] += 42` are not equivalent. So, we need a way to impose an order (= a schedule) on effectful operations.  We could reuse the control chain for this purpose, but this would be stricter than required. For instance, consider this small snippet:
          
            Let’s now add effectful operations (eg, loads and stores from and to memory) in the mix. Similarly to control nodes, effectful operations often have no value dependencies, but still cannot run in a random order. For instance, `a[0] += 42; x = a[0]` and `x = a[0]; a[0] += 42` are not equivalent. So, we need a way to impose an order (= a schedule) on effectful operations. We could reuse the control chain for this purpose, but this would be stricter than required. For instance, consider this small snippet:

(Which I guess, won't even be visible in the resulting HTML... :))

src/blog/leaving-the-sea-of-nodes.md

+              }
+              ```
+              By putting `a[2]` (which reads memory) on the control chain, we would force it to happen before the branch on `c`, even though, in practice, this load could easily happen after the branch if its result is only used inside the body of the then-branch. Having lots of nodes in the program on the control chain would defeat the goal of Sea of Nodes, since we would basically end up with a CFG-like IR where only pure operations float around.

Member

Liedtke Mar 25, 2025

this load could easily happen after the branch if its result is only used inside the body of the then-branch.

And I guess, assuming that the operation does not fail, meaning that a is not null or undefined or any other value that could trigger an exception? Not sure if that level of correctness is relevant here, it could be however, as we are talking about effect and control flow edges as well.

Contributor Author

DadaIsCrazy Mar 25, 2025

Fair point, but I prefer to keep things as simple as possible here :)

src/blog/leaving-the-sea-of-nodes.md


		![Sea of Nodes graph with effectful operations](/_img/leaving-the-sea-of-nodes/Sea-of-Nodes-effects.svg)

		In this example, `arr[0] = 42` and `let x = arr[a]` have no value dependency (ie, the former is not an input of the latter, and vice versa) . However, because `a` could be `0`, `arr[0] = 42` should be executed before `x = arr[a]`, in order for the latter to always load the correct value from the array.

Member

Liedtke Mar 25, 2025

Suggested change

      
            In this example, `arr[0] = 42` and `let x = arr[a]` have no value dependency (ie, the former is not an input of the latter, and vice versa) . However, because `a` could be `0`,  `arr[0] = 42` should be executed before `x = arr[a]`, in order for the latter to always load the correct value from the array.
          
            In this example, `arr[0] = 42` and `let x = arr[a]` have no value dependency (ie, the former is not an input of the latter, and vice versa) . However, because `a` could be `0`,  `arr[0] = 42` should be executed before `x = arr[a]` in order for the latter to always load the correct value from the array.

src/blog/leaving-the-sea-of-nodes.md


		## Manually/visually inspecting and understanding a Sea of Nodes graph is hard

		We’ve already seen that on small programs, CFG is easier to read, as it is closer to the original source code, which is what developers (including Compiler Engineers\!) are used to write. For the unconvinced readers, let me offer a slightly larger example, so that you understand the issue better. Consider the following JavaScript function, which concatenates an array of strings:

Member

Liedtke Mar 25, 2025

Suggested change

      
            We’ve already seen that on small programs, CFG is easier to read, as it is closer to the original source code, which is what developers (including Compiler Engineers\!) are used to write. For the unconvinced readers, let me offer a slightly larger example, so that you understand the issue better. Consider the following JavaScript function, which concatenates an array of strings:
          
            We’ve already seen that on small programs the CFG is easier to read, as it is closer to the original source code, which is what developers (including Compiler Engineers\!) are used to write. For the unconvinced readers, let me offer a slightly larger example, so that you understand the issue better. Consider the following JavaScript function, which concatenates an array of strings:

In general there seem to be so many commas in the whole post but I'm also not sure how many I'd want to remove, so I'll just report the ones that I think are misplaced and make it harder to parse the sentence. :)

src/blog/leaving-the-sea-of-nodes.md

+              You’ll notice that while the source JavaScript program has two identical divisions, the Sea of Node graph only has one. In reality, Sea of Nodes would start with two divisions, but since this is a pure operation (assuming double inputs), redundancy elimination would easily deduplicate them into one.
+              Then when reaching the scheduling phase, we would have to find a place to schedule this division. Clearly, it cannot go after `case 1` or `case 2`, since it’s used in the other one. Instead, it would have to be scheduled before the `switch`. The downside is that, now, `a / b` will be computed even when `c` is `3`, where it doesn’t really need to be computed. This is a real issue that can lead to many deduplicated instructions floating to the common dominator of their users, slowing down many paths that don’t need them.
+              There is a fix though: Turbofan’s scheduler will try to identify these cases, and duplicate the instructions so that they are only computed on the paths that need them. The downside is that this makes the scheduler more complex, requiring additional logic to figure out which nodes could and should be duplicated, and how to duplicate them.

Member

Liedtke Mar 25, 2025

Suggested change

      
            There is a fix though: Turbofan’s scheduler will try to identify these cases, and duplicate the instructions so that they are only computed on the paths that need them. The downside is that this makes the scheduler more complex, requiring additional logic to figure out which nodes could and should be duplicated, and how to duplicate them.
          
            There is a fix though: Turbofan’s scheduler will try to identify these cases and duplicate the instructions so that they are only computed on the paths that need them. The downside is that this makes the scheduler more complex, requiring additional logic to figure out which nodes could and should be duplicated, and how to duplicate them.

src/blog/leaving-the-sea-of-nodes.md


		## Finding a good order to visit the graph is difficult

		All passes of a compiler need to visit the graph, be it to lower nodes, to apply local optimizations, or to run analysis over the whole graph. In a CFG, the order in which to visit nodes is usually straightforward: start from the first block (assuming a single-entry function), and iterate through each node of the block, and then move on to the successors and so on. In a [peephole optimization](https://en.wikipedia.org/wiki/Peephole_optimization) phase (such as [strength reduction](https://en.wikipedia.org/wiki/Strength_reduction)), a nice property of processing the graph in this order is that inputs are always optimized before a node is processed, and visiting each node exactly once is thus enough to apply most peephole optimizations. Consider for instance the following sequence of reductions

Member

Liedtke Mar 25, 2025

Suggested change

      
            All passes of a compiler need to visit the graph, be it to lower nodes, to apply local optimizations, or to run analysis over the whole graph. In a CFG, the order in which to visit nodes is usually straightforward: start from the first block (assuming a single-entry function), and iterate through each node of the block, and then move on to the successors and so on. In a [peephole optimization](https://en.wikipedia.org/wiki/Peephole_optimization) phase (such as [strength reduction](https://en.wikipedia.org/wiki/Strength_reduction)), a nice property of processing the graph in this order is that inputs are always optimized before a node is processed, and visiting each node exactly once is thus enough to apply most peephole optimizations. Consider for instance the following sequence of reductions
          
            All passes of a compiler need to visit the graph, be it to lower nodes, to apply local optimizations, or to run analysis over the whole graph. In a CFG, the order in which to visit nodes is usually straightforward: start from the first block (assuming a single-entry function), and iterate through each node of the block, and then move on to the successors and so on. In a [peephole optimization](https://en.wikipedia.org/wiki/Peephole_optimization) phase (such as [strength reduction](https://en.wikipedia.org/wiki/Strength_reduction)), a nice property of processing the graph in this order is that inputs are always optimized before a node is processed, and visiting each node exactly once is thus enough to apply most peephole optimizations. Consider for instance the following sequence of reductions:

src/blog/leaving-the-sea-of-nodes.md


		In total, it took three steps to optimize the whole sequence, and each step did useful work. After which, dead code elimination would remove `v1` and `v2`, resulting in one less instruction than in the initial sequence.

		With Sea of Nodes, it’s not possible to process pure instructions from start to end, since they aren’t on any control or effect chain, and thus there is no pointer to pure roots or anything like that. Instead, the usual way to process a Sea of Nodes graph for peephole optimizations is to start from the end (e.g., `return` instructions), and go up the value, effect and control inputs. This has the nice property that we won’t visit any unused instruction, but the upsides stop about there, because for peephole optimization, this is about the worst visitation order you could get. On the example above, here are the steps we would take:

Member

Liedtke Mar 25, 2025

Suggested change

      
            With Sea of Nodes, it’s not possible to process pure instructions from start to end, since they aren’t on any control or effect chain, and thus there is no pointer to pure roots or anything like that. Instead, the usual way to process a Sea of Nodes graph for peephole optimizations is to start from the end (e.g., `return` instructions), and go up the value, effect and control inputs. This has the nice property that we won’t visit any unused instruction, but the upsides stop about there, because for peephole optimization, this is about the worst visitation order you could get. On the example above, here are the steps we would take:
          
            With Sea of Nodes it’s not possible to process pure instructions from start to end since they aren’t on any control or effect chain and thus there is no pointer to pure roots or anything like that. Instead, the usual way to process a Sea of Nodes graph for peephole optimizations is to start from the end (e.g., `return` instructions) and go up the value, effect and control inputs. This has the nice property that we won’t visit any unused instruction, but the upsides stop about there, because for peephole optimization this is about the worst visitation order you could get. On the example above, here are the steps we would take:

src/blog/leaving-the-sea-of-nodes.md


		## Cache unfriendliness

		Almost all phases in Turbofan mutate the graph in-place. Given that nodes are fairly large in memory (mostly because each node has pointers to both its inputs and its uses), we try to reuse nodes as much as possible. However, inevitably, when we lower nodes to sequences of multiple nodes, we have to introduce new nodes, which will necessarily not be allocated close to the original node in memory. As a result, the deeper we go through the Turbofan pipeline and the more phases we run, the less cache friendly the graph is. Here is an illustration of this phenomenon:

Member

Liedtke Mar 25, 2025

Nit:

Suggested change

      
            Almost all phases in Turbofan mutate the graph in-place. Given that nodes are fairly large in memory (mostly because each node has pointers to both its inputs and its uses), we try to reuse nodes as much as possible. However, inevitably, when we lower nodes to sequences of multiple nodes, we have to introduce new nodes, which will necessarily not be allocated close to the original node in memory. As a result, the deeper we go through the Turbofan pipeline and the more phases we run, the less cache friendly the graph is. Here is an illustration of this phenomenon:
          
            Almost all phases in Turbofan mutate the graph in-place. Given that nodes are fairly large in memory (mostly because each node has pointers to both its inputs and its uses), we try to reuse nodes as much as possible. However, inevitably, when we lower nodes to sequences of multiple nodes, we have to introduce new nodes, which will necessarily not be allocated close to the original node in memory. As a result, the deeper we go through the Turbofan pipeline and the more phases we run, the less cache friendly the graph becomes. Here is an illustration of this phenomenon:

src/blog/leaving-the-sea-of-nodes.md


		It’s hard to figure out what is inside of a loop. Before lots of nodes are floating outside of the control chain, it’s hard to figure out what is inside each loop. As a result, basic optimizations such as loop peeling and loop unrolling are hard to implement.

		Compiling is slow. This is a direct consequence of multiple issues that I’ve already mentioned: it’s hard to find a good visitation order for nodes, which leads to many useless revisitation, state tracking is expensive, memory usage is bad, cache locality is bad… This might not be a big deal for an ahead of time compiler, but in a JIT compiler, compiling slowly means that we keep executing slow unoptimized code until the optimized code is ready, while taking away resources from other tasks (eg, other compilation jobs, or the Garbage Collector). One consequence of this is that we are forced to think very carefully about the compile time \- speedup tradeoff of new optimizations, often erring towards the size of optimizing less to keep optimizing fast.

Member

Liedtke Mar 25, 2025

Suggested change

      
            **Compiling is slow.** This is a direct consequence of multiple issues that I’ve already mentioned: it’s hard to find a good visitation order for nodes, which leads to many useless revisitation, state tracking is expensive, memory usage is bad, cache locality is bad… This might not be a big deal for an ahead of time compiler, but in a JIT compiler, compiling slowly means that we keep executing slow unoptimized code until the optimized code is ready, while taking away resources from other tasks (eg, other compilation jobs, or the Garbage Collector). One consequence of this is that we are forced to think very carefully about the compile time \- speedup tradeoff of new optimizations, often erring towards the size of optimizing less to keep optimizing fast.
          
            **Compiling is slow.** This is a direct consequence of multiple issues that I’ve already mentioned: it’s hard to find a good visitation order for nodes, which leads to many useless revisitation, state tracking is expensive, memory usage is bad, cache locality is bad… This might not be a big deal for an ahead of time compiler, but in a JIT compiler, compiling slowly means that we keep executing slow unoptimized code until the optimized code is ready, while taking away resources from other tasks (e.g. other compilation jobs or the Garbage Collector). One consequence of this is that we are forced to think very carefully about the compile time \- speedup tradeoff of new optimizations, often erring towards the size of optimizing less to keep optimizing fast.

DadaIsCrazy mentioned this pull request

Small fixes to the Sea of Nodes post #798

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet