Linking Extension #197

Pat-Lafon · 2022-05-07T20:25:33Z

This wip pr is an attempt at an extension to allow linking/importing Bril programs. My understanding is that this has been previously proposed in sampsyo/cs6120#42 / https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/making-function-calls-work/ but there hasn't been an attempt to make it somewhat official(except accidentally in #7). Inspired by this work, I am proposing a modified version of the import syntax/loadbril idea.

Motivation:

This extension is motivated in two parts. First, as the number of Bril programs grow, it becomes more useful to reuse code from previous programs, especially programs that implement utility functions like were seen in #192. Second, programs built from multiple Bril files could be an interesting use case for the source position extension #161, #167 in reporting error messages #191. Currently, source positions assume a single file of origin which is not necessarily true for an arbitrary front-end compiler.

Goals:

Allow Bril programs to include other Bril programs in a c-like preprocessing step.
Be able to link in Bril programs (like those in the current repository) without modification.
Have some kind of psuedo-module system at the file level by mangling function names to ensure uniqueness.
- Avoid function name clashes across imported files.
Allow nested and circular imports
Have the output program of loadbril be run-able by brili/brilirs/other Bril-based tools without modification.

Syntax:

Text form:

Grammar:

'from' PATH 'import' (FUNC ('as' FUNC)? ',')* ';'

Example:

from ../../benchmarks/bitwise-ops.bril import @AND, @OR as @LIB_OR, @loop_subroutine;

JSON form

{
  "functions": [
    ...
  ],
  "imports": [
    {
      "names": [
        {
          "name": "AND"
        },
        {
          "alias": "LIB_OR",
          "name": "OR"
        },
        {
          "name": "loop_subroutine"
        }
      ],
      "path": "../../benchmarks/bitwise-ops.bril"
    }
  ],

Semantics

Path

Programmers can provide an absolute file path or relative file path from the current file to the Bril program that they want to include. The file must contain an extension of either *.bril or *.json to specify how the program should be parsed. Ideally this would be any valid file path agnostic of platform but I haven't looked that into parsing paths so it's just r"([A-Za-z]|_|\.|[0-9]|\\|/|-)*", // path at the moment.

Names

From a file, names from that file can be "imported". Currently this is just valid function names but could be extended to support any extension with module level named things(like structs #138). To avoid clashes with other imported names or local names during usage, the programmer can provide an optional alias. Aliases are the way of getting around function name uniqueness. Attempts to use the same alias name as another imported name, imported alias, or local name will return an error. This allows the programmer to use write code like:

from ../../benchmarks/bitwise-ops.bril import @AND, @OR as @LIB_OR, @loop_subroutine;

# Does OR and throws out the result
@OR(a: int, b: int): int{
    v1: int = call @LIB_OR a b;
    zero: int = const 0;
    ret zero;
}

@test(){
    a: int = const 1;
    b: int = const 0;
    ans: int = call @AND a b;
    print ans;
    ans: int = call @LIB_OR a b;
    print ans;
    ans: int = call @OR a b;
    print ans;
}

@main(){
    call @test;
}

Mangling

After all imported paths have been resolved into programs and all names/aliases collected, the contained/used names in each program are mangled to ensure that names declared in a program(function declarations) are unique. The exact mangling is left undefined except that it produces valid names that brili/brilirs would expect but in practice I'm just adding a bunch of underscores to the absolute file name. The only names left unmangled are those explicitly expected by an interpreter which currently is only the top level main. Imported programs have their main functions mangled, so they can be used if they are aliased.

For example:

@Users___patricklafontaine___Desktop___Github___bril___bril-rs___loadbril___add.bril___OR(a: int, b: int): int {
  v1: int = call @Users___patricklafontaine___Desktop___Github___bril___benchmarks___bitwise-ops.bril___OR a b;
  zero: int = const 0;
  ret zero;
}

loadbril

loadbril takes a filename as input and writes the final program in json form to stdout. Thus, one can loadbril add.bril | brili or add any arbitrary program in between in a composable fashion. loadbril does not take the file itself in stdin like other tools such as bril2txt do because we need to know the file location of the input file. This tool uses the AbstractProgram representation so it should be as flexible as bril2txt are in terms of new extensions(besides any additional mangling that an extension like structs might need). Note, all functions are imported from a program, not just the ones that are used. The elimination of this dead code is left to other tools.

WIP

I'm seeking feedback on this extension especially because I've both deviated from the original proposal and that some parts were left unresolved as in sampsyo/cs6120#42. It might also be that this isn't the right way to go about linking for an IR. I think this approach works well in keeping everything modular but one could also add this as part of the interpreter or go about this another way.
Documentation, testing, and addressing the clippy lints are still to be done. I just wanted to hack something together to see what it would look like.

sampsyo · 2022-05-08T14:49:01Z

Awesome; thanks for looking into this! Seems like a super fun extension to explore. I like that this can work as a language extension in the sense that it can be "compiled away" by the static linker.

Abstract syntax nits:

Let's maybe make the key function instead of name. That way, it may be possible to import other stuff (global variables? struct definitions?) in the future.

Text syntax nits:

To avoid needing to put any constraints on what characters are allowed to appear in paths, how about we just make this a string literal? That is, require it to be in double quotes. (Like JavaScript import statements.)

Conceptual comments:

I'm somewhat wary of allowing the ability to import text-format source code. Part of the ethos is that the text format is extremely second-class: it is a nice way to look at and author Bril files, but the canonical representation is always the JSON AST---so even fully-fledged Bril tools never need to consider that any particular text format exists. So despite the downsides, I'd recommend that the language itself only support importing from JSON files. Translating dependencies from text to JSON would be the responsibility of a generalized parser/linker.
Maybe this goes without saying, but when we define the semantics of import, it shouldn't have to do with linking. That is, the semantics should just describe providing access to the imported functions. Combining all the definitions, mangling, etc. would all be implementation details in a linker. An interpreter could implement the same semantics "directly" by loading up multiple modules and keeping the namespaces separate.
A surprisingly tricky issue is the interaction with the filesystem and the notion of "search paths" (i.e., what is the base directory for relative import paths?). Starting with the current file's location is certainly convenient, but it's also a little weird in light of how often we process files as raw streams. We could also consider generalizing the semantics a bit so we have a notion of a "search path" that defaults to the file's location but could also be specified separately. Or maybe it should have no default at all, and would always need to be specified manually to a linker/loader? The latter would change the feel from "import this specific file from my filesystem" to "import a module with this name from my designated directory full of libraries." The latter is more like how typical programming language import systems work (Rust, Python, JavaScript) and even somewhat like how DLLs (.so files on Unix) work, what with LD_LIBRARY_PATH and such.

Unimportant thoughts:

loadbril is great to have… but perhaps it should have a name that indicates even more clearly that it's a linker (because, these days, "loading" means something slightly different)? Like brild as a silly portmanteau with ld. 😃

Anyway, that's just a bunch of disorganized thoughts! Others are free to weigh in as well. 😃

Pat-Lafon · 2022-05-09T03:01:46Z

Let's maybe make the key function instead of name. That way, it may be possible to import other stuff (global variables? struct definitions?) in the future.

Sure, I wasn't sure whether importing should be abstracted over all possible things to import as just globally "named" things but this probably makes more sense.

To avoid needing to put any constraints on what characters are allowed to appear in paths, how about we just make this a string literal? That is, require it to be in double quotes.

Yeah, this is a lot easier. I've made it r#""[^"]*""# which is the simplest form of string literal handling.

loadbril is great to have… but perhaps it should have a name that indicates even more clearly that it's a linker (because, these days, "loading" means something slightly different)? Like brild as a silly portmanteau with ld. 😃

Works for me

Maybe this goes without saying, but when we define the semantics of import, it shouldn't have to do with linking. That is, the semantics should just describe providing access to the imported functions. Combining all the definitions, mangling, etc. would all be implementation details in a linker. An interpreter could implement the same semantics "directly" by loading up multiple modules and keeping the namespaces separate.

As you can see, I was making the first pass with only brild in mind. I'll try to be more careful in dividing up what I want brild to do compared to what the language extension says.

I'm somewhat wary of allowing the ability to import text-format source code. Part of the ethos is that the text format is extremely second-class: it is a nice way to look at and author Bril files, but the canonical representation is always the JSON AST---so even fully-fledged Bril tools never need to consider that any particular text format exists. So despite the downsides, I'd recommend that the language itself only support importing from JSON files. Translating dependencies from text to JSON would be the responsibility of a generalized parser/linker.

Would it then be sufficient to just chop off the extension?

from ../../benchmarks/bitwise-ops import @AND, @OR as @LIB_OR, @loop_subroutine;

Then when creating the documentation for this extension, say that "a fully compliant implementation of this extension is only required to look for a JSON version of the module". I would like to leave the door open for brild or other tools to link not just JSON AST but also possibly the text format(which looking at the repository is the defacto file format for Bril programs), compliant typescript (as in test/ts) or other languages with a parser to Bril JSON.

A surprisingly tricky issue is the interaction with the filesystem and the notion of "search paths" (i.e., what is the base directory for relative import paths?). Starting with the current file's location is certainly convenient, but it's also a little weird in light of how often we process files as raw streams.

I was aware of this when I started(if your referring to how the initial file is passed by filename instead of as a stream). This was done to be consistent for nesting imports, as in when I import a file containing imports, it has a starting location of it's file location which a stream doesn't have.

Or maybe it should have no default at all, and would always need to be specified manually to a linker/loader? The latter would change the feel from "import this specific file from my filesystem" to "import a module with this name from my designated directory full of libraries."

Like this?

// either space or comma separated
brild ../../benchmarks/ /absolute/path/to/some/directory $BRIL_LIBRARY_PATH < add.bril

Then have

from bitwise-ops import @AND, @OR as @LIB_OR, @loop_subroutine;

Where brild tries each of the paths until it finds a path that contains the bitwise-ops.* file? (Since I'm viewing files as being modules.) (I personally find this passing of paths to be rather ugly for tooling to use but I can see that this is probably nicer language-wise than what I was going to do with something like https://docs.rs/shellexpand/latest/shellexpand/ to expand Bril import paths for environment variables.)

I've been concerned that there would be issues if two directories contain a file named bitwise-ops.json. I guess we could allow the module system to have a hierarchy? As in:

from benchmarks/bitwise-ops import @AND, @OR as @LIB_OR, @loop_subroutine;

like how in rust you might write

use benchmarks::bitwise_ops::{AND, OR as LIB_OR, loop_subroutine};

except with / separators to make it easier to resolve file paths.

sampsyo · 2022-05-09T12:13:16Z

Yes, something more or less like this is what I was imagining! Some food for thought on top of the above:

A simple policy to address the "two different files with the same name in different search-path directories" is to just take the first one. Not exactly elegant, but it is at least simple.
Allowing paths within the search paths also seems just fine. Like, if your search path list contains /foo and /bar and you do from "baz/qux.json" import stuff, then we would look for files /foo/baz/qux.json and /bar/baz/qux.json.
I think I'd slightly lean toward including actual (relative) filenames in the import statement over not-quite-filename strings, such as with leaving off the filename suffix. One nice consequence of using real filenames is that it naturally lends itself to extensibility. That is, we could specify that "proper" Bril programs could only support importing JSON files. But various preprocessor extensions could also check if files seem to look like Bril text, some source code format, bytecode files, and so on. And this could happen without any complicated rules about how we append strings to the import specifiers to obtain filenames. (On the other hand, JavaScript does exactly what you suggest, i.e., filenames without the .js, and that seems to work just fine for them? Even in the presence of preprocessors like TypeScript, which extend the set of suffixes that are searched.)

Pat-Lafon · 2022-05-19T04:24:24Z

I believe all of the outstanding WIP parts have been resolved. There is now some documentation which describes the extension. There are some tests, which may not be all encompassing but I think hit the common cases of importing normally, having nested imports, having recursive imports(in this case, just importing yourself), and having two programs import the same program (inspired by diamond inheritance). brild has been added to the Rust workflows and thus it passes all of the static checks like fmt, clippy, and doc. There aren't any tests related to errors and brild may panic if it is given a program that isn't well-formed but that can be revised later if the tool finds use(which I guess means if I end up using it).

Something to note, brild by default takes in the Bril program via stdin like all Bril tools. This means that the original file does not have an associated filename. As such, you will have duplication of code as in recursive.bril since it doesn't know that the imported file is the same as the stdin program(two copies of the program, one with mangled names and the other without). Semantically this is fine and it is left to other tools to merge functions with different names but the same semantics. As is done with brilirs, I've provided a -f/--file flag as an alternative way to provide the source file which would not have this issue(since then the input file has a corresponding filename).

c794b92 sneaks in a bug fix for source positions.

sampsyo

Looks great to me!! I may have some suggestions about how to make the docs even clearer, but we could do those in a separate PR. If this is good to go in your opinion, let's merge it!

Pat-Lafon · 2022-05-19T17:32:08Z

Great, besides any changes you may have I believe this is good to go. It defines the language extension that I was looking to add and then supports it by adding it to the Rust parsing for Bril programs plus a static linker brild.

I may have some suggestions about how to make the docs even clearer, but we could do those in a separate PR.

Sure, whatever would be easiest for you.

sampsyo · 2022-05-19T20:36:00Z

Great! Thanks again for getting this creative idea started!

sampsyo · 2022-05-19T20:40:13Z

Hmm, looks like the Clippy action failed… any chance you could take a look to see why this didn't run on the PR to catch this ahead of time? https://github.com/sampsyo/bril/runs/6514638293?check_suite_focus=true

Pat-Lafon · 2022-05-19T20:53:06Z

The issue comes from a new version of Clippy being released today alongside the new stable Rust release, 1.61. It contains improvements on lints that were tripped when the workflow was rerun on push.

sampsyo · 2022-05-20T14:02:21Z

Ah, that would explain it! Darn you, Rust developers, for always improving Clippy! 😃

Cleaning up #197

Pat-Lafon added 2 commits May 7, 2022 16:22

comment about spans

905fdb0

First pass at linking

97a67e5

Pat-Lafon added 2 commits May 8, 2022 18:51

Renaming

d7d4795

change path to just be a string literal

aceb875

Pat-Lafon force-pushed the linking branch from 535cad4 to 83ef192 Compare May 19, 2022 03:50

Pat-Lafon added 2 commits May 18, 2022 23:59

Testing, documentation, and cleanup of import extension

4fbc832

Hopefully the final off-by-one bug with source positions

c794b92

Pat-Lafon force-pushed the linking branch from 83ef192 to c794b92 Compare May 19, 2022 03:59

sampsyo approved these changes May 19, 2022

View reviewed changes

sampsyo merged commit b08214a into sampsyo:main May 19, 2022

sampsyo added a commit that referenced this pull request May 20, 2022

Merge pull request #198 from Pat-Lafon/linking

3f7682c

Cleaning up #197

Pat-Lafon mentioned this pull request Jun 5, 2022

Docs cleanup #209

Merged

Pat-Lafon mentioned this pull request Jul 26, 2022

Richer Source Positions #232

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linking Extension #197

Linking Extension #197

Pat-Lafon commented May 7, 2022

sampsyo commented May 8, 2022

Pat-Lafon commented May 9, 2022

sampsyo commented May 9, 2022

Pat-Lafon commented May 19, 2022

sampsyo left a comment

Pat-Lafon commented May 19, 2022

sampsyo commented May 19, 2022

sampsyo commented May 19, 2022

Pat-Lafon commented May 19, 2022

sampsyo commented May 20, 2022

Linking Extension #197

Linking Extension #197

Conversation

Pat-Lafon commented May 7, 2022

Motivation:

Goals:

Syntax:

Text form:

JSON form

Semantics

Path

Names

Mangling

loadbril

WIP

sampsyo commented May 8, 2022

Pat-Lafon commented May 9, 2022

sampsyo commented May 9, 2022

Pat-Lafon commented May 19, 2022

sampsyo left a comment

Choose a reason for hiding this comment

Pat-Lafon commented May 19, 2022

sampsyo commented May 19, 2022

sampsyo commented May 19, 2022

Pat-Lafon commented May 19, 2022

sampsyo commented May 20, 2022