Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile recursive iterators to sub-linear time complexity code. Now it is compiled to quadratic time complexity code. #25052

Open
AlexRadch opened this issue Feb 26, 2018 · 14 comments
Labels
Area-Compilers Feature Request help wanted The issue is "up for grabs" - add a comment if you are interested in working on it
Milestone

Comments

@AlexRadch
Copy link

AlexRadch commented Feb 26, 2018

Compile recursive iterators to sub-linear time complexity code. Now it is compiled to quadratic time complexity code.

Any version:

Recursive iterations code is compiled to quadratic time complexity code.

Steps to Reproduce:

  1. Write sub-linear time recursive code. For example next code:
        static IEnumerable<int> RangeRecursive1(int b, int e)
        {
            if (b > e)
                yield break;
            yield return b;
            foreach(var v in RangeRecursive1(b + 1, e))
                yield return v;
        }

Expected Behavior:

Sub-linear time complexity.

Actual Behavior:

Quadratic time complexity.

Such code is compiled to loop in loop in loop and so on. All that they do is to return values from recursive loops but have quadratic time complexity.

Work around

Use separated loop and replace recursive calls by linear loops. Here https://github.com/AlexRadch/YieldForEachDN I created such loop (I called it Hyperloop). It support IHyperloop interface with two methods:

    public interface IHyperloop<in T>
    {
        void Add(IEnumerable<T> source);
        void Add(IEnumerator<T> loop);
    }

Then any recursive calls can be replaced by adding loops to IHyperloop.

For example next iterator:

        static IEnumerable<int> RangeRecursive1(int b, int e)
        {
            if (b > e)
                yield break;
            yield return b;
            foreach(var v in RangeRecursive1(b + 1, e))
                yield return v;
        }

Can be converted to iterator without loops in loops in loops and so on:

        static IEnumerable<int> RangeRecursive1_Hl(int b, int e)
        {
            var hl = new Hyperloop<int>();
            hl.Add(RangeRecursive1_HlImp(hl, b, e));
            return hl;
        }

        static IEnumerable<int> RangeRecursive1_HlImp(IHyperloop<int> hl, int b, int e)
        {
            if (b > e)
                yield break;
            yield return b;
            hl.Add(RangeRecursive1_HlImp(hl, b + 1, e));
        }

Here RangeRecursive1_Hl is proxy that create Hyperloop and add RangeRecursive1_HlImp iterator to Hyperloop. The RangeRecursive1_HlImp code is similar to original code except that loop code replaced by adding loops to IHyperloop.

            foreach(var v in RangeRecursive1(b + 1, e))
                yield return v;

Code converted to

            hl.Add(RangeRecursive1_HlImp(hl, b + 1, e));

Suggestion:

The small code transformation can reduce quadratic time complexity to sub-linear time complexity in recursive iterators.

So I suggest detect recursive iterators (it is easy) and use such small code transformation in roslyn compiler to reduce time complexity.

@svick
Copy link
Contributor

svick commented Feb 26, 2018

It's not clear to me how exactly would the translation work. For example, consider this code:

IEnumerable<T> DumpAndIterateFromBack<T>(LinkedListNode<T> node)
{
    if (node == null)
        yield break;
        
    foreach (var x in DumpAndIterateFromBack(node.Previous))
        yield return x;

    Console.WriteLine(node.Value);
    yield return node.Value;
}

How would you translate it to use your Hyperloop?

@AlexRadch
Copy link
Author

@svick Now my Hyperloop implementation does not support loops with side effects like in your code but it can be extended to support such code. I will make support of code with side effects and show translation of your code.

@AlexRadch
Copy link
Author

AlexRadch commented Feb 26, 2018

@svick I added support of code with side effects in Hyperloop class.

So next code

        static IEnumerable<int> DumpAndIterateFromBack(int e)
        {
            if (e <= 0)
                yield break;

            foreach (var x in DumpAndIterateFromBack(e - 1))
                yield return x;

            Console.WriteLine(e);
            yield return e;
        }

can be translated to next code:

        static IEnumerable<int> DumpAndIterateFromBack_Hl(int e)
        {
            var hl = new Hyperloop<int>();
            hl.Add(DumpAndIterateFromBack_HlImp(hl, e));
            return hl;
        }

        static IEnumerable<int> DumpAndIterateFromBack_HlImp(IHyperloop<int> hl, int e)
        {
            if (e <= 0)
                yield break;

            hl.Add(DumpAndIterateFromBack_HlImp(hl, e - 1));
            yield return default(int); // will be skiped to support code with side effects

            Console.WriteLine(e);
            yield return e;
        }

So only the loop part:

            foreach (var x in DumpAndIterateFromBack(e - 1))
                yield return x;

Translated to

            hl.Add(DumpAndIterateFromBack_HlImp(hl, e - 1));
            yield return default(int); // will be skiped to support code with side effects

all other code is the same.

@KrisVandermotten
Copy link
Contributor

@AlexRadch
Copy link
Author

@KrisVandermotten I think here https://github.com/AlexRadch/YieldForEachDN I implemented the same ideas as in https://www.researchgate.net/publication/228625370_Iterators_revisited_Proof_rules_and_implementation/.
I think my implementation is more optimal by memory and by performance and with more clear code transformations.

@AlexRadch
Copy link
Author

@Wolfram-Schulte what are you think about my recursive iterators transformation for effective implementation?

@gafter gafter added the help wanted The issue is "up for grabs" - add a comment if you are interested in working on it label Aug 4, 2019
@gafter gafter added this to the Backlog milestone Aug 4, 2019
@gafter
Copy link
Member

gafter commented Aug 4, 2019

We would consider a community contribution that addresses this feature request provided the implementation is sufficiently simple and well isolated in the compilers.

@ranma42
Copy link

ranma42 commented Aug 11, 2021

I would like to try and tackle this.

My current plan would be to use something like GeneratedSequenceBase to "flatten" nested enumerables.

What is the best way to recognize that a sequence has been generated according to these rules (or, which would be the same to me, supports both the "single result" and the "sequence result")?
I am thinking of using an interface, but I am unsure whether it is ok (it would be a somewhat special one, as it would have a meaning for the compiler). On the upside doing it with a "core" interface would make it possible for F# to efficiently yield! a C# enumerable and for C# to yield foreach an F# sequence (efficiently as in flattened, to avoid recursion costs).

@CyrusNajmabadi
Copy link
Member

We would need a solution that came with no perf impact on any existing code.

@ranma42
Copy link

ranma42 commented Aug 11, 2021

Yup, the idea is to only change the codegen if the method contains a yield foreach

@ArcadeMode
Copy link
Contributor

Hello, I'd like to have a crack at this.

My current understanding:

When writing an iterator function that iterates over another iterator function and yield returns the nested iterator's results, then the resulting generated code performs two operations for each of the nested iterator's returns. This appears to internally be two statemachines where the inner state-change causes the outer to state change which is simply propagating values. This propagation takes as many operations as nesting is deep, which in the examples above can be arbitrarily deep due to recursion, resulting in quadratic complexity. But it is also present in simple nested usages of such foreach + yield returning methods.

If I understand the research paper's intuition what it does is not propagate values 'upward' but instead have the outer-most iterator point directly to the nested iterator which is 'currently' active, however deep it may be. Thus avoiding the costly propagation.

There is a suggestion to enable new yield foreach syntax. I could try and implement this syntax which would (probably?) allow me to rewrite the expression such that there is no such propagation happening by pointing the outer iterator directly to the inner one untill it has completed. Existing foreach with yield return would remain how they are (and thus still enable e.g. side effects to be implemented for such inner-iterator value changes).

Is my understanding correct and would the suggested approach be desired?

@CyrusNajmabadi
Copy link
Member

@ArcadeMode the critical thing is that any existing code that does a standard foreach over any existing IEnumerables (or anything foreach operates on) not have any negative impact whatsoever.

@ArcadeMode
Copy link
Contributor

I found the discussion in the csharplang repo and noticed that the suggested approaches are unsuitable. Its a marvelously complex issue for sure! The related discussion about supporting yield foreach as syntax sugar is also very interesting. But I digress,

(I am still getting familar with this part of roslyn so if im about to spew nonsense dont hesitate to tell me).

What I am thinking is to use yield foreach as a means to have the enumerator block enumerate the inner enumerator directly. It would effectively mean when the outer enumerator's MoveNext() is invoked and its at a yield foreach instruction, the call is forwarded to that enumerator's MoveNext() and its yield return is also forwarded directly. This would avoid the state-change propagation by tapping into the nested enumerator directly. Since most of the behavior is coded at the Lowering stage I feel this may be possible.

Furthermore this would not touch foreach and thus would actually be opt-in behavior for avoiding the time complexity explosion that can arrise here.

I would love your feedback on this.

@CyrusNajmabadi
Copy link
Member

Furthermore this would not touch foreach and thus would actually be opt-in behavior for avoiding the time complexity explosion that can arrise here.

Could you give an example of:

  1. what the user would write inside the iterator.
  2. what that code would actually get translated to.
  3. what the user would write when consuming the iterator.
  4. what that code would actually get translated to.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Feature Request help wanted The issue is "up for grabs" - add a comment if you are interested in working on it
Projects
None yet
Development

No branches or pull requests

7 participants