Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Added initial Rust codegen-meta implementation. #403

Merged
merged 26 commits into from
Jul 19, 2018

Conversation

data-pup
Copy link
Member

This PR contains the first steps towards implementing the DSL in Rust, as previously discussed in #342. This intends to replace the gen_types.py file, and will emit a file named new_types.rs in the same directory as the existing types.rs file, when the codegen crate is built.

Thanks for the mentoring @sunfishcode! Had a lot of fun putting this together.

@@ -0,0 +1,3 @@
//! Definitions for the base Cretonne language.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed a few instances of the name "Cretonne" snuck through, I can rectify these.

process::exit(1);
});

if let Err(err) = meta::gen_types::generate("new_types.rs", &out_dir) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specifically would be the line that changes when we want to have the codegen-meta crate emit a types.rs file instead, and replace the version currently being emitted by the Python code in codegen/meta/ 😄

@data-pup
Copy link
Member Author

data-pup commented Jul 16, 2018

Build isn't quite done yet, but I noticed these problems in the 1.25 build. I can fix these today, should be fairly straightforward.

error[E0658]: non-reference pattern used to match a reference (see issue #42640)

no method named try_for_each found for type ...

As for the nightly job failing, I'm much less sure how to fix that:

LLVM ERROR: IO failure on output stream: No space left on device
error: Could not compile `syntex_syntax`.

Any other feedback on this PR in general would be very appreciated though :)

}

pub struct IntIterator {
index: usize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be an u8.

3 => Some(Int::I64),
_ => None,
};
self.index += 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about release mode overflow? (Call .next() usize::MAX times and you get I8 again) You could replace the _ => None, with _ => return None, to prevent this.

@data-pup
Copy link
Member Author

Thanks @bjorn3! Didn't know about that early return trick for iterators, but that makes a lot of sense. That's cleaned up now 👍

Added a few other fixes, including fmt::Debug implementations for the different ValueType variants, to give the same output as the __repr__ methods in the original Python, as well as making changes for compatibility with v1.25.

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Here are some comments from an initial read through the code.

authors = ["The Cranelift Project Developers"]
version = "0.15.0"
description = "DSL for cranelift-codegen code generator library"
license = "Apache-2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cranelift's license is now "Apache-2.0 WITH LLVM-exception".

type Item = LaneType;
fn next(&mut self) -> Option<Self::Item> {
if let b @ Some(_) = self.bool_iter.next() {
b.map(LaneType::from)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be written as

      if let Some(b) = self.bool_iter.next() {
            LaneType::from(b)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the method returns an option, this could also rewritten as:

      if let Some(b) = self.bool_iter.next() {
            Some(LaneType::from(b))

I can write it that way, so it isn't quite as opaque :)

/// Check if `x` is a power of two.
fn _is_power_of_two(x: u8) -> bool {
x > 0 && x & (x - 1) == 0
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the Python code has this, but for Rust we should use the standard library's is_power_of_two.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, should I just go ahead and remove both this and the next_power_of_two functions from this file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

}

res + 1
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for next_power_of_two.

name = "cranelift-codegen-meta"
authors = ["The Cranelift Project Developers"]
version = "0.15.0"
description = "DSL for cranelift-codegen code generator library"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a minor point, I'm thinking I'd like to move away from "DSL" terminology in general. It feels like part of the problem with the Python code is the DSL approach, which leans a little toward being its own specialized world and a little away from being idiomatic Python code. This makes working within the system more streamlined, but it also makes debugging the system and changing how the system works harder.

So how about "Metaprogram for cranelift-codegen code generator library"?

}
}

/// Return the name of this type for other Rust source files.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Rust" is a little less unambiguous now that the metaprogram is Rust too :-). I think it's enough to say "generated Rust source files" in the comment here.


/// Get the name of this vector type.
pub fn name(&self) -> String {
format!("{}X{}", self.base.name(), self.lanes,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lower-case 'x' in the python code, which I think is a little easier to read.

pub fn number(&self) -> u8 {
let b = f64::from(self.base.number());
let l = (self.lanes as f64).log2();
let num = 16_f64 * l + b;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this computation without using floating point? I guess it's a little more work because we need a helper:

fn floor_log2(x: u64) -> u32 {
  63 - x.leading_zeros()
}

but it would mean that we won't need to add an exception for the float_arithmetic clippy lint that we enable in some configurations.

Also, we should give the "16" constant a name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this a little bit. After re-reading this comment I think I had an idea:

// Vector types are encoded with the lane type in the low 4 bits and log2(lanes)
// in the high 4 bits, giving a range of 2-256 lanes.

The reason we are calculating 16 * log_2(number_of_lanes) is to shift that value into the top four bits. Maybe it would be more legible if we used << instead? It's probably also worth reiterating that numbering comment above the VectorType's number method too.

//! `lib/codegen/ir/types.rs`. The file provides constant definitions for the
//! most commonly used types, including all of the scalar types.
//!
//! This ensures that Python and Rust use the same type numbering.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures that the metaprogram and the generated program see the same type numbering :-).

/// Given a multi-line string, split it into a sequence of lines after
/// stripping a common indentation. This is useful for strings defined with
/// doc strings.
fn parse_multiline(s: &str) -> Vec<String> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially concerned that this is too Python-specific, but looking at the way Rust handles multi-line literals, we may indeed want to write Rust code for this in much the way we write Python:

    fmt.doc_comment("
                    An instruction format

                    Every opcode has a corresponding instruction format
                    which is represented by both the `InstructionFormat`
                    and the `InstructionData` enums.
                    ")

which would then want the same logic of stripping common leading whitespace, and so on. Does that sound right?

authors = ["The Cranelift Project Developers"]
version = "0.15.0"
description = "DSL for cranelift-codegen code generator library"
license = "Apache-2.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a copy of the LICENSE file under lib/codegen-meta (this is a recent change).

@data-pup
Copy link
Member Author

data-pup commented Jul 17, 2018

Added a series of commits to address the various points in your notes :)

I wasn't sure about the doc_comment note above however. That makes sense to me, should I change anything regarding the parse_multilines function if so? Let me know if there are any other details I should change! Thanks again for the feedback.

(Update: Checked through the travis failures, I can fix the 1.25 issues today.)

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, with just one stylistic comment below. Also, there's a minor merge conflict with a version number change in one of the Cargo.toml files.

I'm thinking once that's squared away, I'd like to merge this. I expect we'll stick with new_types.rs for a little while yet, so we can get some experience with build-time crates in various settings, but if things go smoothly, then we can think about flipping that switch too :-).

fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
&ErrorInner::Msg(ref s) => write!(f, "{}", s),
&ErrorInner::IoError(ref e) => write!(f, "{}", e),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere, you can change this to match *self, and then you don't need a & on the front of each match arm. This also makes it more consistent with the rest of the code in Cranelift.

@data-pup
Copy link
Member Author

Took care of the style issue mentioned :) Sounds good! I'll leave the generated file as 'new_types.rs`.

@sunfishcode sunfishcode merged commit 5e93243 into bytecodealliance:master Jul 19, 2018
@sunfishcode
Copy link
Member

Thanks! I'm excited about where we'll go with this :-)

@data-pup
Copy link
Member Author

Me too! It was a lot of fun to work on this. If you have any ideas for next steps, I'd love to try and help out with some of that as well! We can talk about that back in #342 😄

@sunfishcode sunfishcode mentioned this pull request Jul 20, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants