Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking Extension #197

Merged
merged 6 commits into from
May 19, 2022
Merged

Linking Extension #197

merged 6 commits into from
May 19, 2022

Conversation

Pat-Lafon
Copy link
Contributor

This wip pr is an attempt at an extension to allow linking/importing Bril programs. My understanding is that this has been previously proposed in sampsyo/cs6120#42 / https://www.cs.cornell.edu/courses/cs6120/2019fa/blog/making-function-calls-work/ but there hasn't been an attempt to make it somewhat official(except accidentally in #7). Inspired by this work, I am proposing a modified version of the import syntax/loadbril idea.

Motivation:

This extension is motivated in two parts. First, as the number of Bril programs grow, it becomes more useful to reuse code from previous programs, especially programs that implement utility functions like were seen in #192. Second, programs built from multiple Bril files could be an interesting use case for the source position extension #161, #167 in reporting error messages #191. Currently, source positions assume a single file of origin which is not necessarily true for an arbitrary front-end compiler.

Goals:

  • Allow Bril programs to include other Bril programs in a c-like preprocessing step.
  • Be able to link in Bril programs (like those in the current repository) without modification.
  • Have some kind of psuedo-module system at the file level by mangling function names to ensure uniqueness.
    • Avoid function name clashes across imported files.
  • Allow nested and circular imports
  • Have the output program of loadbril be run-able by brili/brilirs/other Bril-based tools without modification.

Syntax:

Text form:

Grammar:

'from' PATH 'import' (FUNC ('as' FUNC)? ',')* ';'

Example:

from ../../benchmarks/bitwise-ops.bril import @AND, @OR as @LIB_OR, @loop_subroutine;

JSON form

{
  "functions": [
    ...
  ],
  "imports": [
    {
      "names": [
        {
          "name": "AND"
        },
        {
          "alias": "LIB_OR",
          "name": "OR"
        },
        {
          "name": "loop_subroutine"
        }
      ],
      "path": "../../benchmarks/bitwise-ops.bril"
    }
  ],

Semantics

Path

Programmers can provide an absolute file path or relative file path from the current file to the Bril program that they want to include. The file must contain an extension of either *.bril or *.json to specify how the program should be parsed. Ideally this would be any valid file path agnostic of platform but I haven't looked that into parsing paths so it's just r"([A-Za-z]|_|\.|[0-9]|\\|/|-)*", // path at the moment.

Names

From a file, names from that file can be "imported". Currently this is just valid function names but could be extended to support any extension with module level named things(like structs #138). To avoid clashes with other imported names or local names during usage, the programmer can provide an optional alias. Aliases are the way of getting around function name uniqueness. Attempts to use the same alias name as another imported name, imported alias, or local name will return an error. This allows the programmer to use write code like:

from ../../benchmarks/bitwise-ops.bril import @AND, @OR as @LIB_OR, @loop_subroutine;

# Does OR and throws out the result
@OR(a: int, b: int): int{
    v1: int = call @LIB_OR a b;
    zero: int = const 0;
    ret zero;
}

@test(){
    a: int = const 1;
    b: int = const 0;
    ans: int = call @AND a b;
    print ans;
    ans: int = call @LIB_OR a b;
    print ans;
    ans: int = call @OR a b;
    print ans;
}

@main(){
    call @test;
}

Mangling

After all imported paths have been resolved into programs and all names/aliases collected, the contained/used names in each program are mangled to ensure that names declared in a program(function declarations) are unique. The exact mangling is left undefined except that it produces valid names that brili/brilirs would expect but in practice I'm just adding a bunch of underscores to the absolute file name. The only names left unmangled are those explicitly expected by an interpreter which currently is only the top level main. Imported programs have their main functions mangled, so they can be used if they are aliased.

For example:

@Users___patricklafontaine___Desktop___Github___bril___bril-rs___loadbril___add.bril___OR(a: int, b: int): int {
  v1: int = call @Users___patricklafontaine___Desktop___Github___bril___benchmarks___bitwise-ops.bril___OR a b;
  zero: int = const 0;
  ret zero;
}

loadbril

loadbril takes a filename as input and writes the final program in json form to stdout. Thus, one can loadbril add.bril | brili or add any arbitrary program in between in a composable fashion. loadbril does not take the file itself in stdin like other tools such as bril2txt do because we need to know the file location of the input file. This tool uses the AbstractProgram representation so it should be as flexible as bril2txt are in terms of new extensions(besides any additional mangling that an extension like structs might need). Note, all functions are imported from a program, not just the ones that are used. The elimination of this dead code is left to other tools.

WIP

I'm seeking feedback on this extension especially because I've both deviated from the original proposal and that some parts were left unresolved as in sampsyo/cs6120#42. It might also be that this isn't the right way to go about linking for an IR. I think this approach works well in keeping everything modular but one could also add this as part of the interpreter or go about this another way.
Documentation, testing, and addressing the clippy lints are still to be done. I just wanted to hack something together to see what it would look like.

@sampsyo
Copy link
Owner

sampsyo commented May 8, 2022

Awesome; thanks for looking into this! Seems like a super fun extension to explore. I like that this can work as a language extension in the sense that it can be "compiled away" by the static linker.

Abstract syntax nits:

  • Let's maybe make the key function instead of name. That way, it may be possible to import other stuff (global variables? struct definitions?) in the future.

Text syntax nits:

  • To avoid needing to put any constraints on what characters are allowed to appear in paths, how about we just make this a string literal? That is, require it to be in double quotes. (Like JavaScript import statements.)

Conceptual comments:

  • I'm somewhat wary of allowing the ability to import text-format source code. Part of the ethos is that the text format is extremely second-class: it is a nice way to look at and author Bril files, but the canonical representation is always the JSON AST---so even fully-fledged Bril tools never need to consider that any particular text format exists. So despite the downsides, I'd recommend that the language itself only support importing from JSON files. Translating dependencies from text to JSON would be the responsibility of a generalized parser/linker.
  • Maybe this goes without saying, but when we define the semantics of import, it shouldn't have to do with linking. That is, the semantics should just describe providing access to the imported functions. Combining all the definitions, mangling, etc. would all be implementation details in a linker. An interpreter could implement the same semantics "directly" by loading up multiple modules and keeping the namespaces separate.
  • A surprisingly tricky issue is the interaction with the filesystem and the notion of "search paths" (i.e., what is the base directory for relative import paths?). Starting with the current file's location is certainly convenient, but it's also a little weird in light of how often we process files as raw streams. We could also consider generalizing the semantics a bit so we have a notion of a "search path" that defaults to the file's location but could also be specified separately. Or maybe it should have no default at all, and would always need to be specified manually to a linker/loader? The latter would change the feel from "import this specific file from my filesystem" to "import a module with this name from my designated directory full of libraries." The latter is more like how typical programming language import systems work (Rust, Python, JavaScript) and even somewhat like how DLLs (.so files on Unix) work, what with LD_LIBRARY_PATH and such.

Unimportant thoughts:

  • loadbril is great to have… but perhaps it should have a name that indicates even more clearly that it's a linker (because, these days, "loading" means something slightly different)? Like brild as a silly portmanteau with ld. 😃

Anyway, that's just a bunch of disorganized thoughts! Others are free to weigh in as well. 😃

@Pat-Lafon
Copy link
Contributor Author

  • Let's maybe make the key function instead of name. That way, it may be possible to import other stuff (global variables? struct definitions?) in the future.

Sure, I wasn't sure whether importing should be abstracted over all possible things to import as just globally "named" things but this probably makes more sense.

  • To avoid needing to put any constraints on what characters are allowed to appear in paths, how about we just make this a string literal? That is, require it to be in double quotes.

Yeah, this is a lot easier. I've made it r#""[^"]*""# which is the simplest form of string literal handling.

  • loadbril is great to have… but perhaps it should have a name that indicates even more clearly that it's a linker (because, these days, "loading" means something slightly different)? Like brild as a silly portmanteau with ld. 😃

Works for me

  • Maybe this goes without saying, but when we define the semantics of import, it shouldn't have to do with linking. That is, the semantics should just describe providing access to the imported functions. Combining all the definitions, mangling, etc. would all be implementation details in a linker. An interpreter could implement the same semantics "directly" by loading up multiple modules and keeping the namespaces separate.

As you can see, I was making the first pass with only brild in mind. I'll try to be more careful in dividing up what I want brild to do compared to what the language extension says.

  • I'm somewhat wary of allowing the ability to import text-format source code. Part of the ethos is that the text format is extremely second-class: it is a nice way to look at and author Bril files, but the canonical representation is always the JSON AST---so even fully-fledged Bril tools never need to consider that any particular text format exists. So despite the downsides, I'd recommend that the language itself only support importing from JSON files. Translating dependencies from text to JSON would be the responsibility of a generalized parser/linker.

Would it then be sufficient to just chop off the extension?

from ../../benchmarks/bitwise-ops import @AND, @OR as @LIB_OR, @loop_subroutine;

Then when creating the documentation for this extension, say that "a fully compliant implementation of this extension is only required to look for a JSON version of the module". I would like to leave the door open for brild or other tools to link not just JSON AST but also possibly the text format(which looking at the repository is the defacto file format for Bril programs), compliant typescript (as in test/ts) or other languages with a parser to Bril JSON.

  • A surprisingly tricky issue is the interaction with the filesystem and the notion of "search paths" (i.e., what is the base directory for relative import paths?). Starting with the current file's location is certainly convenient, but it's also a little weird in light of how often we process files as raw streams.

I was aware of this when I started(if your referring to how the initial file is passed by filename instead of as a stream). This was done to be consistent for nesting imports, as in when I import a file containing imports, it has a starting location of it's file location which a stream doesn't have.

Or maybe it should have no default at all, and would always need to be specified manually to a linker/loader? The latter would change the feel from "import this specific file from my filesystem" to "import a module with this name from my designated directory full of libraries."

Like this?

// either space or comma separated
brild ../../benchmarks/ /absolute/path/to/some/directory $BRIL_LIBRARY_PATH < add.bril

Then have

from bitwise-ops import @AND, @OR as @LIB_OR, @loop_subroutine;

Where brild tries each of the paths until it finds a path that contains the bitwise-ops.* file? (Since I'm viewing files as being modules.) (I personally find this passing of paths to be rather ugly for tooling to use but I can see that this is probably nicer language-wise than what I was going to do with something like https://docs.rs/shellexpand/latest/shellexpand/ to expand Bril import paths for environment variables.)

I've been concerned that there would be issues if two directories contain a file named bitwise-ops.json. I guess we could allow the module system to have a hierarchy? As in:

from benchmarks/bitwise-ops import @AND, @OR as @LIB_OR, @loop_subroutine;

like how in rust you might write

use benchmarks::bitwise_ops::{AND, OR as LIB_OR, loop_subroutine};

except with / separators to make it easier to resolve file paths.

@sampsyo
Copy link
Owner

sampsyo commented May 9, 2022

Yes, something more or less like this is what I was imagining! Some food for thought on top of the above:

  1. A simple policy to address the "two different files with the same name in different search-path directories" is to just take the first one. Not exactly elegant, but it is at least simple.
  2. Allowing paths within the search paths also seems just fine. Like, if your search path list contains /foo and /bar and you do from "baz/qux.json" import stuff, then we would look for files /foo/baz/qux.json and /bar/baz/qux.json.
  3. I think I'd slightly lean toward including actual (relative) filenames in the import statement over not-quite-filename strings, such as with leaving off the filename suffix. One nice consequence of using real filenames is that it naturally lends itself to extensibility. That is, we could specify that "proper" Bril programs could only support importing JSON files. But various preprocessor extensions could also check if files seem to look like Bril text, some source code format, bytecode files, and so on. And this could happen without any complicated rules about how we append strings to the import specifiers to obtain filenames. (On the other hand, JavaScript does exactly what you suggest, i.e., filenames without the .js, and that seems to work just fine for them? Even in the presence of preprocessors like TypeScript, which extend the set of suffixes that are searched.)

@Pat-Lafon
Copy link
Contributor Author

I believe all of the outstanding WIP parts have been resolved. There is now some documentation which describes the extension. There are some tests, which may not be all encompassing but I think hit the common cases of importing normally, having nested imports, having recursive imports(in this case, just importing yourself), and having two programs import the same program (inspired by diamond inheritance). brild has been added to the Rust workflows and thus it passes all of the static checks like fmt, clippy, and doc. There aren't any tests related to errors and brild may panic if it is given a program that isn't well-formed but that can be revised later if the tool finds use(which I guess means if I end up using it).

Something to note, brild by default takes in the Bril program via stdin like all Bril tools. This means that the original file does not have an associated filename. As such, you will have duplication of code as in recursive.bril since it doesn't know that the imported file is the same as the stdin program(two copies of the program, one with mangled names and the other without). Semantically this is fine and it is left to other tools to merge functions with different names but the same semantics. As is done with brilirs, I've provided a -f/--file flag as an alternative way to provide the source file which would not have this issue(since then the input file has a corresponding filename).

c794b92 sneaks in a bug fix for source positions.

Copy link
Owner

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!! I may have some suggestions about how to make the docs even clearer, but we could do those in a separate PR. If this is good to go in your opinion, let's merge it!

@Pat-Lafon
Copy link
Contributor Author

Great, besides any changes you may have I believe this is good to go. It defines the language extension that I was looking to add and then supports it by adding it to the Rust parsing for Bril programs plus a static linker brild.

I may have some suggestions about how to make the docs even clearer, but we could do those in a separate PR.

Sure, whatever would be easiest for you.

@sampsyo
Copy link
Owner

sampsyo commented May 19, 2022

Great! Thanks again for getting this creative idea started!

@sampsyo sampsyo merged commit b08214a into sampsyo:main May 19, 2022
@sampsyo
Copy link
Owner

sampsyo commented May 19, 2022

Hmm, looks like the Clippy action failed… any chance you could take a look to see why this didn't run on the PR to catch this ahead of time? https://github.com/sampsyo/bril/runs/6514638293?check_suite_focus=true

@Pat-Lafon
Copy link
Contributor Author

The issue comes from a new version of Clippy being released today alongside the new stable Rust release, 1.61. It contains improvements on lints that were tripped when the workflow was rerun on push.

@sampsyo
Copy link
Owner

sampsyo commented May 20, 2022

Ah, that would explain it! Darn you, Rust developers, for always improving Clippy! 😃

sampsyo added a commit that referenced this pull request May 20, 2022
@Pat-Lafon Pat-Lafon mentioned this pull request Jun 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants