Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

app.function: top-level functions in notebook files #2293

Open
akshayka opened this issue Sep 10, 2024 · 10 comments
Open

app.function: top-level functions in notebook files #2293

akshayka opened this issue Sep 10, 2024 · 10 comments
Milestone

Comments

@akshayka
Copy link
Contributor

akshayka commented Sep 10, 2024

This issue is for making it possible to import functions directly from marimo notebooks:

from notebook import my_function

Only "pure" functions will be importable, ie functions that have no refs besides imported modules. Some care will need to be taken to handle modules.

A cell that includes a single function def, and nothing more, will be saved in the notebook file as:

@app.function
def my_function():
  ...

Discussed on:

@patrick-kidger
Copy link

I worry that privileging just single-cell functions is going to be a huge UX footgun. There are many other things I might reasonably want the same behaviour from. (Classes are the obvious example.) Moreover I will almost always want my function to depend upon other values:

SOME_CONSTANT = 42

def the_answer_to_life_the_universe_and_everything() -> int:
    return SOME_CONSTANT

Or -- I may simply write code that depends upon being able to import a function like this... and now find that I am unable to adjust this function's behaviour without breaking that import resolution.


Can I make an alternate suggestion? Marimo supports adding meaningful names for cells, so it could be arranged for the syntax import my_notebook; my_notebook.my_cell.my_function to work.

In particular this would:

  • Automatically evaluate all cells that my_cell depends upon.
  • Set __module__ and __qualname__ appropriately, so that even dynamic resolution via importlib.import_module works correctly.
  • ... and as such, basically support the full generality of things that we may wish to do in Python.

@akshayka
Copy link
Contributor Author

akshayka commented Sep 13, 2024

Or -- I may simply write code that depends upon being able to import a function like this... and now find that I am unable to adjust this function's behaviour without breaking that import resolution.

Oh, that's a very good point; that kind of action at a distance would be sad. I hadn't considered this.

Regarding

so it could be arranged for the syntax import my_notebook; my_notebook.my_cell.my_function to work

It's definitely an interesting idea, and would be readily done by abstracting over the low-level API we have for programmatically running cells: https://docs.marimo.io/api/cell.html. AFAIK right now I'm the only consumer of that API, it's quite low level.

I've considered what you've suggested before, but the downside is that you won't get code completion, type information, etc. when typing out

my_notebook.my_cell.my_function

whereas top-level functions would play well with an LSP and other static analysis tools.

Given these tradeoffs, would you still prefer a cell-based API? We'd have to make a breaking change to the cell API to support this (what if a user had a function called run?), or perhaps we could expose it under a namespace, such as my_cell.functions.my_function ...

@patrick-kidger
Copy link

Okay, refining this idea a little bit: it should be possible to define a module-level __getattr__ that runs the appropriate cells up to the point of providing that name. That would enable the simpler my_module.my_value based lookup, without having to name an individual cell.

This would then offer your originally suggested API, whilst supporting all kinds of Python types. (And neatly sidesteps questions about the Cell API.)

Autocompletion could be done via an if TYPE_CHECKING: block somewhere in the file.

@dmadisetti
Copy link
Collaborator

dmadisetti commented Sep 14, 2024

Relevant: #1954

So in previous discussions @app.function, would be primarily used for serializing and creating cells that with a name, and denoting that it is a pure function. Importing a function from a cell is disingenuous when it can rely on those additional, cell specific references that are not constants. For instance:

result = expensive_computation()
mutable_arr_in_scope = []

# Not clear how exposed_fn should be exposed,
# or what the behavior should be on state change.
# Even if convention decided, should be intuitive API
# Not worth complexity IMO
def exposed_fn() -> int:
    mutable_arr_in_scope.append(state_in_diff_cell())
    return result, len(mutable_arr_in_scope)

One fix might be to move constants (all caps and are 'primitive') to the top level script. There has been discussion about doing this for import only cells (#1379), it might be feasible to do this for constant only cells

import marimo

# cell 2 <-  (Auto generated comment? Use name if given)
import numpy as np

# cell 3
SCALE = 1

__generated_with = "0.0.0"
app = marimo.app()

@app.function
def lengthify(v, scale=SCALE, axis=-1):
   return scale * v / np.linalg.norm(v, axis=axis)

The previous discussion of @app.function made the UI very simple.

  1. If a cell with only a function definition hits the scope requirements, then it is automatically turned into an @app.function, and cell name shows that (similar to a sql cell showing the df name).
  2. If a cell with only a function definition does not hit the requirements (cell name conflicts with function name, scope refs are not modules or constants), then a "?" icon appears where cell name should be, explaining why it is not turned into a @app.function

@dmadisetti
Copy link
Collaborator

Another implementation realization. @app.function should be able to reference other @app.functions

EPSILON = 1e-12

@app.function
def probs_ortho(v, u):
    return np.abs(np.dot(lengthify(v), lengthify(u))) <= EPSILON

@mscolnick mscolnick added this to the marimo 1.0.0 milestone Nov 25, 2024
akshayka pushed a commit that referenced this issue Dec 19, 2024
I was really interested in having tests in notebooks (re discussion
yesterday), so I played around with it a tiny bit this afternoon. As is,
pytest does not work out of the box since:
 1. the wrapped functions become Cell instances
2. coroutine detection through the `_is_coroutine` spec should be as an
attribute

This change enables both of these minor adjustments. Now, pytest will
pick up
marimo `test_*` functions, but they will fail since directly invoking a
cell is
currently not allowed. I put in a little unit test to fight against a
possible
regression until such time when a top level fn spec is enabled (see
#2293).

@akshayka OR @mscolnick

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@dmadisetti
Copy link
Collaborator

Had another idea about delimiting top level declarations in the notebook.

Opposed to a comment, the lines of the notebook (or statement count) could be registered:

import numpy as np # line 1, import count 1
import pandas as pd # line 2, import  count 2
# line 3
import jax # line 4, import count 3
# line 5
SCALE = 1 # line 6, constant count 1
# line 7
__generated_with = "0.0.0"
import marimo
app = marimo.app()

@app.cell
def _():
   return mo.md("This cell is displayed first")

@app.cell
def _():
   return mo.md("## Imports")

# The the imports by line number are displayed in editor here
app.declare_imports(count=2)
# or app.declare_imports(lines=(1, 3)) # np, pd

app.declare_imports(count=1)
# or app.declare_imports(lines=[4]) # jax

@app.cell
def _():
   return mo.md("## Constants")

app.declare_constants(lines=[6]) # primitive values only
# app.declare_constants(count=1)

Which removes the need for statically parsing the file opposed to just running it.

Couple functionality thoughts:

  • imports not paired with a declare_x, can be merged with the closest declaration, or automatically added as a block to the end of the notebook.
  • If using lines, after __generated_with cannot be indexed
  • Invalid top level blocks (e.g. a = custom_obj or logic) should be stripped (as is currently the case)

"import only" blocks are already detected, so implementing this change is

  1. building declare_import to build a cell from the notebook content (then verify import only)
  2. changing the serialization of the notebook

"primitive only" blocks, would be easy enough to detect (no refs, assign statements only)- maybe enforce defs in UPPER- but might be better suited as followup feature once the lessons of modules is learned.

@leventov
Copy link

leventov commented Feb 10, 2025

Which removes the need for statically parsing the file opposed to just running it.

But adds the burden on those who edit the files: if I add a line somewhere to the top of the file, suddenly something below in the file becomes invalid. This seems way too cumbersome for "normal" development a la #2288.

There is already kind of a standard for cell-delimiting comments, # %%: https://docs.spyder-ide.org/4/panes/editor.html#defining-code-cells, supported by VS Code and IntelliJ as well.

"Normal development" should be like:

  • Just normal top-level functions and variables
  • Classes are "split up" such that each method is a separate top-level block (which permits preceding it with # %%) a la fastcore.patch.
  • Tests and examples are in mo.cells.

Objective: dogfooding, develop marimo in marimo!

@dmadisetti
Copy link
Collaborator

Yeah, I agree with the editing issue. I don't think there's a problem in just duplicating:

import numpy as np # Import only blocks, just duplicated
import pandas as pd

import jax

__generated_with = "0.0.0"
import marimo
app = marimo.app()

@app.cell
def _():
   return mo.md("## Imports")

@app.cell
def _():
  import numpy as np # Import only blocks, just duplicated
  import pandas as pd


@app.cell
def _():
   import jax

With manual editing overriding the top level imports over the cell level (and last cell creation if no corresponding block exists)

@akshayka
Copy link
Contributor Author

I think the simplest solution for pure top-level functions proposed so far is to (1) serialize pure functions as app.function decorated functions, and (2) duplicate imports to the top of the notebook when an imported module is used in an otherwise pure function. This is readable, adds just minimal noise and works well with manual editing of notebook files.

The only downside I see is the footgun of accidentally making functions impure, as Patrick brought up, but perhaps we could address this in the editor user interface, as @dmadisetti suggested in a previous comment.

@leventov's suggestion of enabling normal development is interesting. In fact Pluto.jl has a #%% style file format which enables development of Pluto in Pluto, although Pluto.jl does not have the equivalent of our mo.cells.

I think there are two issues that have been brought up in this discussion.

(1) When a user writes a cell containing only a pure function in the marimo editor, how should we serialize it as a "top-level function" in the notebook file, so it can be easily used by other Python modules? That is the topic of this GitHub issue.
(2) When a user types functions, classes, variables — arbitrary Python code — in the file itself, should we (a) preserve them on save from the marimo editor and (b) make them visible in the marimo editor?

@leventov
Copy link

leventov commented Feb 12, 2025

(2) When a user types functions, classes, variables — arbitrary Python code — in the file itself, should we (a) preserve them on save from the marimo editor and (b) make them visible in the marimo editor?

I would add:
(2) When a user types functions, classes, variables — arbitrary Python code — in the file itself, should we [...] (c) present their outputs in marimo editor?
(1) Whether to permit caching the results of top-level functions, and if yes, how. Prior relevant discussions: #2653, #3270.

Doing AI evals or a la https://www.braintrust.dev/docs/guides/playground would be a breeze in Marimo if caching was optionally supported for top-level functions.

I'm now much less sure about the "XTDB" suggestion that I made in the first message of #3176. On the other hand, the arguments to these functions could be Pickled and be thought of as "cell inputs" for module_hash computation. This approach probably won't "scale" (for example because the local persistence subsystem as currently envisioned won't store last_accessed and thus cannot do LRU; no insight into the parameters and thus not possible to prune the cache "intelligently" along these params), but for AI evals it can often be "good enough", so why not enable it, such as via App(default_cache_top_level_fns=True).

mscolnick pushed a commit that referenced this issue Feb 12, 2025
## 📝 Summary

Adds top level functions

- `@app.fn` (but maybe `app.function` is better) to wrap functions,
register them but leave them directly exposed
 - Serialize to `@app.fn` if there are no dependencies
 
 Just an intro PR for #2293

For completion of 2293, I think out standing changes are:
 - [ ] some UI aspect (maybe a new language toggle?)
 - [ ] Top level imports
- [ ] Looser requirements for serialization (can use imports and other
app.fns)

But this is hidden under `App(_toplevel_fn=True)`

## 🔍 Description of Changes

see the smoke test or:

```python
@app.fn
def self_ref_fib(n: int) -> int:
    if n == 0:
        return 0
    if n == 1:
        return 1
    return self_ref_fib(n - 1) + self_ref_fib(n - 2)


@app.fn(disabled=True, hide_code=True)
def divide(x, y):
    return y / x


@app.fn
def subtraction(a: "int", b: "int") -> "int":
    return a - b


@app.fn
# Comments inbetween
def multiply(a, b) -> "int":
    return a * b


@app.fn
def addition(a: int, b: int) -> int:
    # int is considered no good, re-eval
    return a + b
```

<!--
Detail the specific changes made in this pull request. Explain the
problem addressed and how it was resolved. If applicable, provide before
and after comparisons, screenshots, or any relevant details to help
reviewers understand the changes easily.
-->

## 📋 Checklist

- [ ] I have read the [contributor
guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md).
- [ ] For large changes, or changes that affect the public API: this
change was discussed or approved through an issue, on
[Discord](https://marimo.io/discord?ref=pr), or the community
[discussions](https://github.com/marimo-team/marimo/discussions) (Please
provide a link if applicable).
- [ ] I have added tests for the changes made.
- [ ] I have run the code and verified that it works as expected.

## 📜 Reviewers

<!--
Tag potential reviewers from the community or maintainers who might be
interested in reviewing this pull request.

Your PR will be reviewed more quickly if you can figure out the right
person to tag with @ -->

@akshayka OR @mscolnick

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
mscolnick pushed a commit that referenced this issue Feb 18, 2025
## 📝 Summary

Followup to #3755 for #2293 allowing for "top level imports"

For completion of #2293, I thin UI changes and needed for enabling this
behavior. Notably:

   - Indicate when in function mode (maybe top level import too)
   - Provide hints when pushed out of function mode
   - Maybe allow the user to opt out of function mode?
 
\+ docs

This also increases security risk since code is run outside of runtime.
This was always possible, but now marimo can save in a format that could
skip the marimo runtime all together on restart.

There are opportunities here. marimo could lean into this, and leverage
external code running as a chance to hook in (almost a plugin system for
free)

But also issues, since a missing dep could stop the notebook from
running at all (goes against the "batteries included" ethos). This can
be mitigated with static analysis over just an import (markdown does
this for instance), or marimo can re-serialize the notebook in the
"safe" form, if it comes across issues in import.

## 🔍 Description of Changes

Includes a bit of a refactor to codegen since there were a fair amount
of changes.
Allows top level imports of "import only" cells. The contents are pasted
at the top of the file, with a bit of care not to break header
extraction.

```python
# Normal headers are retained
# Use a notice to denote where generated imports start
# Notice maybe needs some copy edit

# 👋 This file was generated by marimo. You can edit it, and tweak
# things- just be conscious that some changes may be overwritten if opened in
# the editor. For instance top level imports are derived from a cell, and not
# the top of the script. This notice signifies the beginning of the generated
# import section.

# Could also make this app.imports? But maybe increasing surface area for no reason
import numpy
# Note, import cells intentionally do not have a `return`
# for static analysis feature below

import marimo


__generated_with = "0.11.2"
app = marimo.App(_toplevel_fn=True)


@app.cell
def import_cell():
    # Could also make this app.imports? But maybe increasing surface area for no reason
    import numpy
    # Note, import cells intentionally do not have a `return`
    # for static analysis feature below

```

Top level refs (this includes `@app.function`s) are ignored in the
signatures. E.g.

```python
import marimo as mo

# ...

@app.cell
def md_cell():
    mo.md("Hi")
    return 
```

Since I was also in there, I added static analysis to ignore returning
dangling defs.

```python
@app.cell
def cell_with_dangling_def():
    a = 1
    b = 2
    return (a,) # No longer returns b since it's not used anywhere. Allowing for linters like ruff to complain.

@app.cell
def ref_cell(a):
    a + 1
    return 
```

LMK if too far reaching and we can break it up/ refactor. A bit more
opinionated than the last PR

Test border more on being more smoke tests than unit tests, but hit the
key issues I was worried about. I can break them down more granularly if
needed. Also LMK if you can think of some more edgecases.

## 📜 Reviewers

<!--
Tag potential reviewers from the community or maintainers who might be
interested in reviewing this pull request.

Your PR will be reviewed more quickly if you can figure out the right
person to tag with @ -->

@akshayka OR @mscolnick

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants