Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended IR #147

Closed
6 tasks done
aalexandrov opened this issue Jan 18, 2016 · 3 comments
Closed
6 tasks done

Extended IR #147

aalexandrov opened this issue Jan 18, 2016 · 3 comments
Assignees
Labels

Comments

@aalexandrov
Copy link
Contributor

aalexandrov commented Jan 18, 2016

This is a meta-issue that should track the planned changes in the IR.

Throughout the discussion in this issue, we will use the following schema based on the MostConsistentClassifier workflow:

// product types
case class Profile(id: Long, name: String, surname: String)
case class Email(id: Long, ip: Int, content: String)
case class Server(ip: IP, isBlacklisted: Boolean)
case class Classifier(model: Long, weights: SparseVector[Double])

Motivation

Most of the transformation-based optimizations in Emma can only be applied if certain conditions apply. Deciding whether the conditions are satisfied typically involves some form of dataflow analysis on top of lifted code.

Example: Fold-Group Fusion

Consider the following code that produces two aggregates over the emails grouped by their ip value:

for {
  g <- emails.groupBy(_.ip)
} yield {
  val agg1 = g.values.sum()
  val agg2 = g.values.map(_.content.size()).sum()
  (g.key, agg1, agg2)
}

The code is eligible to fold-group fusion (FGF) as all g.values uses appear within the context of chain of (zero or more) map applications ending in a fold.

While (FGF) of the variation above works at the moment, the following variations will not be considered eligible, as at the moment there is no direct and simple way to consider constant propagation in the Scala expressions referenced in comprehended terms:

// structural typing on the argument
for {
  (key, values) <- emails.groupBy(_.ip)
} yield {
  val agg1 = values.sum()
  val agg2 = values.map(_.content.size()).sum()
  (key, agg1, agg2)
}

// projection aliases
for {
  g <- emails.groupBy(_.ip)
} yield {
  val values = g.values
  val agg1 = values.sum()
  val agg2 = values.map(_.content.size()).sum()
  (g.key, agg1, agg2)
}

Example: Join Ordering

Another scenario where a better IR might be beneficial is join ordering. Consider the following code

for {
  e <- emails
  s <- people
  r <- people
  if from(e.content) == s.email
  if to(e.content) == s.email
} yield {
  val x1 = e1 // depends on e and s
  val x2 = e2 // depends on x1 and r
  f(x2)
}

In order to reduce the amount of communication cost when this expression is compiled to a join cascade, we might want to apply x1 right after we join e and s, prune all extra fields, and only after that join with r.

In effect, this means that the head expressions is split into multiple parts and parts of it might be applied on top of a rhs of aggregator during the combinator rewrite phase.

Approach

In order to make the compiler oblivious to such code modifications, I propose to extend the current comprhension IR in the following strategy.

  • Target a limited subset of Scala expressions based on the usage patterns observed in emma-examples. Lift, normalize, and transform the whole quoted three rather than only the comprehended expressions.
  • Use a flavour of an SSA-like IR which simplifies answers based on use-def and def-use chains. A good candidate seems to be direct style in let-normal form for a good overview and a short introduction to this IR, please refer to Chapter 6 in the SSA Book.
  • Restructure the code in a way which will allow us to use it both for macro-based and reflection-based compilation.

Refactoring Plan

The refactoring of the code should be based on the following refactoring plan. Each item will be tracked by a separate issue:

@aalexandrov aalexandrov added the IR label Jan 18, 2016
@aalexandrov aalexandrov self-assigned this Jan 18, 2016
@aalexandrov aalexandrov added this to the Feb 2016 milestone Jan 18, 2016
@joroKr21
Copy link
Member

Actually we could use a somewhat hacky solution to bring the separate hierarchies together. Since Scala's Constant(Literal(_)) is polymorphic we could wrap the comprehension nodes in constants and then use a modified form of traversal that works on union types (or possibly do 2 passes - one for Scala nodes and one for Emma nodes).

aalexandrov added a commit that referenced this issue Jan 22, 2016
Resolves #153.

Adapted the ComprehensionModel nodes so all fields are immutable.

The only place where mutability is left is in the Generator type, as
otherwise we would have to also rewrite the FoldGroupFusion optimization,
which won't be trivial and at the moment seems unnecessary.
The Generator type should be fixed as we make further progress on #147.
@aalexandrov aalexandrov modified the milestones: Feb 2016, Mar 2016 Feb 26, 2016
@aalexandrov aalexandrov modified the milestones: Mar 2016, Apr 2016 Apr 4, 2016
@aalexandrov aalexandrov removed the MACROS label Apr 5, 2016
@aalexandrov aalexandrov modified the milestone: Apr 2016 Apr 5, 2016
@joroKr21
Copy link
Member

joroKr21 commented Dec 6, 2016

I think we can close this now

@aalexandrov
Copy link
Contributor Author

Alright. I've linked the issue on the wiki for future reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants