app.function: top-level functions in notebook files #2293

akshayka · 2024-09-10T20:59:50Z

This issue is for making it possible to import functions directly from marimo notebooks:

from notebook import my_function

Only "pure" functions will be importable, ie functions that have no refs besides imported modules. Some care will need to be taken to handle modules.

A cell that includes a single function def, and nothing more, will be saved in the notebook file as:

@app.function
def my_function():
  ...

Discussed on:

Discord Discussion: https://discord.com/channels/1059888774789730424/1059891311190229082/1253813629317288108
GitHub Discussion: Replace nbdev? #2288

patrick-kidger · 2024-09-13T22:33:36Z

I worry that privileging just single-cell functions is going to be a huge UX footgun. There are many other things I might reasonably want the same behaviour from. (Classes are the obvious example.) Moreover I will almost always want my function to depend upon other values:

SOME_CONSTANT = 42

def the_answer_to_life_the_universe_and_everything() -> int:
    return SOME_CONSTANT

Or -- I may simply write code that depends upon being able to import a function like this... and now find that I am unable to adjust this function's behaviour without breaking that import resolution.

Can I make an alternate suggestion? Marimo supports adding meaningful names for cells, so it could be arranged for the syntax import my_notebook; my_notebook.my_cell.my_function to work.

In particular this would:

Automatically evaluate all cells that my_cell depends upon.
Set __module__ and __qualname__ appropriately, so that even dynamic resolution via importlib.import_module works correctly.
... and as such, basically support the full generality of things that we may wish to do in Python.

akshayka · 2024-09-13T22:59:59Z

Or -- I may simply write code that depends upon being able to import a function like this... and now find that I am unable to adjust this function's behaviour without breaking that import resolution.

Oh, that's a very good point; that kind of action at a distance would be sad. I hadn't considered this.

Regarding

so it could be arranged for the syntax import my_notebook; my_notebook.my_cell.my_function to work

It's definitely an interesting idea, and would be readily done by abstracting over the low-level API we have for programmatically running cells: https://docs.marimo.io/api/cell.html. AFAIK right now I'm the only consumer of that API, it's quite low level.

I've considered what you've suggested before, but the downside is that you won't get code completion, type information, etc. when typing out

my_notebook.my_cell.my_function

whereas top-level functions would play well with an LSP and other static analysis tools.

Given these tradeoffs, would you still prefer a cell-based API? We'd have to make a breaking change to the cell API to support this (what if a user had a function called run?), or perhaps we could expose it under a namespace, such as my_cell.functions.my_function ...

patrick-kidger · 2024-09-14T08:41:52Z

Okay, refining this idea a little bit: it should be possible to define a module-level __getattr__ that runs the appropriate cells up to the point of providing that name. That would enable the simpler my_module.my_value based lookup, without having to name an individual cell.

This would then offer your originally suggested API, whilst supporting all kinds of Python types. (And neatly sidesteps questions about the Cell API.)

Autocompletion could be done via an if TYPE_CHECKING: block somewhere in the file.

dmadisetti · 2024-09-14T16:28:41Z

Relevant: #1954

So in previous discussions @app.function, would be primarily used for serializing and creating cells that with a name, and denoting that it is a pure function. Importing a function from a cell is disingenuous when it can rely on those additional, cell specific references that are not constants. For instance:

result = expensive_computation()
mutable_arr_in_scope = []

# Not clear how exposed_fn should be exposed,
# or what the behavior should be on state change.
# Even if convention decided, should be intuitive API
# Not worth complexity IMO
def exposed_fn() -> int:
    mutable_arr_in_scope.append(state_in_diff_cell())
    return result, len(mutable_arr_in_scope)

One fix might be to move constants (all caps and are 'primitive') to the top level script. There has been discussion about doing this for import only cells (#1379), it might be feasible to do this for constant only cells

import marimo

# cell 2 <-  (Auto generated comment? Use name if given)
import numpy as np

# cell 3
SCALE = 1

__generated_with = "0.0.0"
app = marimo.app()

@app.function
def lengthify(v, scale=SCALE, axis=-1):
   return scale * v / np.linalg.norm(v, axis=axis)

The previous discussion of @app.function made the UI very simple.

If a cell with only a function definition hits the scope requirements, then it is automatically turned into an @app.function, and cell name shows that (similar to a sql cell showing the df name).
If a cell with only a function definition does not hit the requirements (cell name conflicts with function name, scope refs are not modules or constants), then a "?" icon appears where cell name should be, explaining why it is not turned into a @app.function

dmadisetti · 2024-09-14T16:59:57Z

Another implementation realization. @app.function should be able to reference other @app.functions

EPSILON = 1e-12

@app.function
def probs_ortho(v, u):
    return np.abs(np.dot(lengthify(v), lengthify(u))) <= EPSILON

@akshayka

I was really interested in having tests in notebooks (re discussion yesterday), so I played around with it a tiny bit this afternoon. As is, pytest does not work out of the box since: 1. the wrapped functions become Cell instances 2. coroutine detection through the `_is_coroutine` spec should be as an attribute This change enables both of these minor adjustments. Now, pytest will pick up marimo `test_*` functions, but they will fail since directly invoking a cell is currently not allowed. I put in a little unit test to fight against a possible regression until such time when a top level fn spec is enabled (see #2293). @akshayka OR @mscolnick --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

dmadisetti · 2024-12-20T00:14:19Z

Had another idea about delimiting top level declarations in the notebook.

Opposed to a comment, the lines of the notebook (or statement count) could be registered:

import numpy as np # line 1, import count 1
import pandas as pd # line 2, import  count 2
# line 3
import jax # line 4, import count 3
# line 5
SCALE = 1 # line 6, constant count 1
# line 7
__generated_with = "0.0.0"
import marimo
app = marimo.app()

@app.cell
def _():
   return mo.md("This cell is displayed first")

@app.cell
def _():
   return mo.md("## Imports")

# The the imports by line number are displayed in editor here
app.declare_imports(count=2)
# or app.declare_imports(lines=(1, 3)) # np, pd

app.declare_imports(count=1)
# or app.declare_imports(lines=[4]) # jax

@app.cell
def _():
   return mo.md("## Constants")

app.declare_constants(lines=[6]) # primitive values only
# app.declare_constants(count=1)

Which removes the need for statically parsing the file opposed to just running it.

Couple functionality thoughts:

imports not paired with a declare_x, can be merged with the closest declaration, or automatically added as a block to the end of the notebook.
If using lines, after __generated_with cannot be indexed
Invalid top level blocks (e.g. a = custom_obj or logic) should be stripped (as is currently the case)

"import only" blocks are already detected, so implementing this change is

building declare_import to build a cell from the notebook content (then verify import only)
changing the serialization of the notebook

"primitive only" blocks, would be easy enough to detect (no refs, assign statements only)- maybe enforce defs in UPPER- but might be better suited as followup feature once the lessons of modules is learned.

leventov · 2025-02-10T17:33:36Z

Which removes the need for statically parsing the file opposed to just running it.

But adds the burden on those who edit the files: if I add a line somewhere to the top of the file, suddenly something below in the file becomes invalid. This seems way too cumbersome for "normal" development a la #2288.

There is already kind of a standard for cell-delimiting comments, # %%: https://docs.spyder-ide.org/4/panes/editor.html#defining-code-cells, supported by VS Code and IntelliJ as well.

"Normal development" should be like:

Just normal top-level functions and variables
Classes are "split up" such that each method is a separate top-level block (which permits preceding it with # %%) a la fastcore.patch.
Tests and examples are in mo.cells.

Objective: dogfooding, develop marimo in marimo!

dmadisetti · 2025-02-10T22:08:10Z

Yeah, I agree with the editing issue. I don't think there's a problem in just duplicating:

import numpy as np # Import only blocks, just duplicated
import pandas as pd

import jax

__generated_with = "0.0.0"
import marimo
app = marimo.app()

@app.cell
def _():
   return mo.md("## Imports")

@app.cell
def _():
  import numpy as np # Import only blocks, just duplicated
  import pandas as pd


@app.cell
def _():
   import jax

With manual editing overriding the top level imports over the cell level (and last cell creation if no corresponding block exists)

akshayka · 2025-02-11T04:22:42Z

I think the simplest solution for pure top-level functions proposed so far is to (1) serialize pure functions as app.function decorated functions, and (2) duplicate imports to the top of the notebook when an imported module is used in an otherwise pure function. This is readable, adds just minimal noise and works well with manual editing of notebook files.

The only downside I see is the footgun of accidentally making functions impure, as Patrick brought up, but perhaps we could address this in the editor user interface, as @dmadisetti suggested in a previous comment.

@leventov's suggestion of enabling normal development is interesting. In fact Pluto.jl has a #%% style file format which enables development of Pluto in Pluto, although Pluto.jl does not have the equivalent of our mo.cells.

I think there are two issues that have been brought up in this discussion.

(1) When a user writes a cell containing only a pure function in the marimo editor, how should we serialize it as a "top-level function" in the notebook file, so it can be easily used by other Python modules? That is the topic of this GitHub issue.
(2) When a user types functions, classes, variables — arbitrary Python code — in the file itself, should we (a) preserve them on save from the marimo editor and (b) make them visible in the marimo editor?

leventov · 2025-02-12T13:19:11Z

(2) When a user types functions, classes, variables — arbitrary Python code — in the file itself, should we (a) preserve them on save from the marimo editor and (b) make them visible in the marimo editor?

I would add:
(2) When a user types functions, classes, variables — arbitrary Python code — in the file itself, should we [...] (c) present their outputs in marimo editor?
(1) Whether to permit caching the results of top-level functions, and if yes, how. Prior relevant discussions: #2653, #3270.

Doing AI evals or a la https://www.braintrust.dev/docs/guides/playground would be a breeze in Marimo if caching was optionally supported for top-level functions.

I'm now much less sure about the "XTDB" suggestion that I made in the first message of #3176. On the other hand, the arguments to these functions could be Pickled and be thought of as "cell inputs" for module_hash computation. This approach probably won't "scale" (for example because the local persistence subsystem as currently envisioned won't store last_accessed and thus cannot do LRU; no insight into the parameters and thus not possible to prune the cache "intelligently" along these params), but for AI evals it can often be "good enough", so why not enable it, such as via App(default_cache_top_level_fns=True).

@akshayka

## 📝 Summary Adds top level functions - `@app.fn` (but maybe `app.function` is better) to wrap functions, register them but leave them directly exposed - Serialize to `@app.fn` if there are no dependencies Just an intro PR for #2293 For completion of 2293, I think out standing changes are: - [ ] some UI aspect (maybe a new language toggle?) - [ ] Top level imports - [ ] Looser requirements for serialization (can use imports and other app.fns) But this is hidden under `App(_toplevel_fn=True)` ## 🔍 Description of Changes see the smoke test or: ```python @app.fn def self_ref_fib(n: int) -> int: if n == 0: return 0 if n == 1: return 1 return self_ref_fib(n - 1) + self_ref_fib(n - 2) @app.fn(disabled=True, hide_code=True) def divide(x, y): return y / x @app.fn def subtraction(a: "int", b: "int") -> "int": return a - b @app.fn # Comments inbetween def multiply(a, b) -> "int": return a * b @app.fn def addition(a: int, b: int) -> int: # int is considered no good, re-eval return a + b ```  ## 📋 Checklist - [ ] I have read the [contributor guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md). - [ ] For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on [Discord](https://marimo.io/discord?ref=pr), or the community [discussions](https://github.com/marimo-team/marimo/discussions) (Please provide a link if applicable). - [ ] I have added tests for the changes made. - [ ] I have run the code and verified that it works as expected. ## 📜 Reviewers  @akshayka OR @mscolnick --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

@akshayka

## 📝 Summary Followup to #3755 for #2293 allowing for "top level imports" For completion of #2293, I thin UI changes and needed for enabling this behavior. Notably: - Indicate when in function mode (maybe top level import too) - Provide hints when pushed out of function mode - Maybe allow the user to opt out of function mode? \+ docs This also increases security risk since code is run outside of runtime. This was always possible, but now marimo can save in a format that could skip the marimo runtime all together on restart. There are opportunities here. marimo could lean into this, and leverage external code running as a chance to hook in (almost a plugin system for free) But also issues, since a missing dep could stop the notebook from running at all (goes against the "batteries included" ethos). This can be mitigated with static analysis over just an import (markdown does this for instance), or marimo can re-serialize the notebook in the "safe" form, if it comes across issues in import. ## 🔍 Description of Changes Includes a bit of a refactor to codegen since there were a fair amount of changes. Allows top level imports of "import only" cells. The contents are pasted at the top of the file, with a bit of care not to break header extraction. ```python # Normal headers are retained # Use a notice to denote where generated imports start # Notice maybe needs some copy edit # 👋 This file was generated by marimo. You can edit it, and tweak # things- just be conscious that some changes may be overwritten if opened in # the editor. For instance top level imports are derived from a cell, and not # the top of the script. This notice signifies the beginning of the generated # import section. # Could also make this app.imports? But maybe increasing surface area for no reason import numpy # Note, import cells intentionally do not have a `return` # for static analysis feature below import marimo __generated_with = "0.11.2" app = marimo.App(_toplevel_fn=True) @app.cell def import_cell(): # Could also make this app.imports? But maybe increasing surface area for no reason import numpy # Note, import cells intentionally do not have a `return` # for static analysis feature below ``` Top level refs (this includes `@app.function`s) are ignored in the signatures. E.g. ```python import marimo as mo # ... @app.cell def md_cell(): mo.md("Hi") return ``` Since I was also in there, I added static analysis to ignore returning dangling defs. ```python @app.cell def cell_with_dangling_def(): a = 1 b = 2 return (a,) # No longer returns b since it's not used anywhere. Allowing for linters like ruff to complain. @app.cell def ref_cell(a): a + 1 return ``` LMK if too far reaching and we can break it up/ refactor. A bit more opinionated than the last PR Test border more on being more smoke tests than unit tests, but hit the key issues I was worried about. I can break them down more granularly if needed. Also LMK if you can think of some more edgecases. ## 📜 Reviewers  @akshayka OR @mscolnick --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

mscolnick mentioned this issue Sep 19, 2024

Add support for checking doctest strings in user defined functions #2191

Closed

mscolnick added this to the marimo 1.0.0 milestone Nov 25, 2024

leventov mentioned this issue Dec 15, 2024

Persistent cache with auto-cleanup #3176

Open

dmadisetti mentioned this issue Dec 19, 2024

tidy: Tweaks to enable pytest on marimo nbs #3238

Merged

dmadisetti mentioned this issue Feb 11, 2025

feat: top level functions #3755

Merged

7 tasks

dmadisetti mentioned this issue Feb 13, 2025

feat: top level imports #3779

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app.function: top-level functions in notebook files #2293

app.function: top-level functions in notebook files #2293

akshayka commented Sep 10, 2024 •

edited

Loading

patrick-kidger commented Sep 13, 2024

akshayka commented Sep 13, 2024 •

edited

Loading

patrick-kidger commented Sep 14, 2024

dmadisetti commented Sep 14, 2024 •

edited

Loading

dmadisetti commented Sep 14, 2024

dmadisetti commented Dec 20, 2024

leventov commented Feb 10, 2025 •

edited

Loading

dmadisetti commented Feb 10, 2025

akshayka commented Feb 11, 2025

leventov commented Feb 12, 2025 •

edited

Loading

app.function: top-level functions in notebook files #2293

app.function: top-level functions in notebook files #2293

Comments

akshayka commented Sep 10, 2024 • edited Loading

patrick-kidger commented Sep 13, 2024

akshayka commented Sep 13, 2024 • edited Loading

patrick-kidger commented Sep 14, 2024

dmadisetti commented Sep 14, 2024 • edited Loading

dmadisetti commented Sep 14, 2024

dmadisetti commented Dec 20, 2024

leventov commented Feb 10, 2025 • edited Loading

dmadisetti commented Feb 10, 2025

akshayka commented Feb 11, 2025

leventov commented Feb 12, 2025 • edited Loading

akshayka commented Sep 10, 2024 •

edited

Loading

akshayka commented Sep 13, 2024 •

edited

Loading

dmadisetti commented Sep 14, 2024 •

edited

Loading

leventov commented Feb 10, 2025 •

edited

Loading

leventov commented Feb 12, 2025 •

edited

Loading