-
Notifications
You must be signed in to change notification settings - Fork 394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
app.function: top-level functions in notebook files #2293
Comments
I worry that privileging just single-cell functions is going to be a huge UX footgun. There are many other things I might reasonably want the same behaviour from. (Classes are the obvious example.) Moreover I will almost always want my function to depend upon other values: SOME_CONSTANT = 42
def the_answer_to_life_the_universe_and_everything() -> int:
return SOME_CONSTANT Or -- I may simply write code that depends upon being able to import a function like this... and now find that I am unable to adjust this function's behaviour without breaking that import resolution. Can I make an alternate suggestion? Marimo supports adding meaningful names for cells, so it could be arranged for the syntax In particular this would:
|
Oh, that's a very good point; that kind of action at a distance would be sad. I hadn't considered this. Regarding
It's definitely an interesting idea, and would be readily done by abstracting over the low-level API we have for programmatically running cells: https://docs.marimo.io/api/cell.html. AFAIK right now I'm the only consumer of that API, it's quite low level. I've considered what you've suggested before, but the downside is that you won't get code completion, type information, etc. when typing out my_notebook.my_cell.my_function whereas top-level functions would play well with an LSP and other static analysis tools. Given these tradeoffs, would you still prefer a cell-based API? We'd have to make a breaking change to the cell API to support this (what if a user had a function called |
Okay, refining this idea a little bit: it should be possible to define a module-level This would then offer your originally suggested API, whilst supporting all kinds of Python types. (And neatly sidesteps questions about the Autocompletion could be done via an |
Relevant: #1954 So in previous discussions result = expensive_computation()
mutable_arr_in_scope = []
# Not clear how exposed_fn should be exposed,
# or what the behavior should be on state change.
# Even if convention decided, should be intuitive API
# Not worth complexity IMO
def exposed_fn() -> int:
mutable_arr_in_scope.append(state_in_diff_cell())
return result, len(mutable_arr_in_scope) One fix might be to move constants (all caps and are 'primitive') to the top level script. There has been discussion about doing this for import only cells (#1379), it might be feasible to do this for constant only cells import marimo
# cell 2 <- (Auto generated comment? Use name if given)
import numpy as np
# cell 3
SCALE = 1
__generated_with = "0.0.0"
app = marimo.app()
@app.function
def lengthify(v, scale=SCALE, axis=-1):
return scale * v / np.linalg.norm(v, axis=axis) The previous discussion of
|
Another implementation realization. EPSILON = 1e-12
@app.function
def probs_ortho(v, u):
return np.abs(np.dot(lengthify(v), lengthify(u))) <= EPSILON |
I was really interested in having tests in notebooks (re discussion yesterday), so I played around with it a tiny bit this afternoon. As is, pytest does not work out of the box since: 1. the wrapped functions become Cell instances 2. coroutine detection through the `_is_coroutine` spec should be as an attribute This change enables both of these minor adjustments. Now, pytest will pick up marimo `test_*` functions, but they will fail since directly invoking a cell is currently not allowed. I put in a little unit test to fight against a possible regression until such time when a top level fn spec is enabled (see #2293). @akshayka OR @mscolnick --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Had another idea about delimiting top level declarations in the notebook. Opposed to a comment, the lines of the notebook (or statement count) could be registered: import numpy as np # line 1, import count 1
import pandas as pd # line 2, import count 2
# line 3
import jax # line 4, import count 3
# line 5
SCALE = 1 # line 6, constant count 1
# line 7
__generated_with = "0.0.0"
import marimo
app = marimo.app()
@app.cell
def _():
return mo.md("This cell is displayed first")
@app.cell
def _():
return mo.md("## Imports")
# The the imports by line number are displayed in editor here
app.declare_imports(count=2)
# or app.declare_imports(lines=(1, 3)) # np, pd
app.declare_imports(count=1)
# or app.declare_imports(lines=[4]) # jax
@app.cell
def _():
return mo.md("## Constants")
app.declare_constants(lines=[6]) # primitive values only
# app.declare_constants(count=1) Which removes the need for statically parsing the file opposed to just running it. Couple functionality thoughts:
"import only" blocks are already detected, so implementing this change is
"primitive only" blocks, would be easy enough to detect (no refs, assign statements only)- maybe enforce defs in UPPER- but might be better suited as followup feature once the lessons of modules is learned. |
But adds the burden on those who edit the files: if I add a line somewhere to the top of the file, suddenly something below in the file becomes invalid. This seems way too cumbersome for "normal" development a la #2288. There is already kind of a standard for cell-delimiting comments, "Normal development" should be like:
Objective: dogfooding, develop marimo in marimo! |
Yeah, I agree with the editing issue. I don't think there's a problem in just duplicating: import numpy as np # Import only blocks, just duplicated
import pandas as pd
import jax
__generated_with = "0.0.0"
import marimo
app = marimo.app()
@app.cell
def _():
return mo.md("## Imports")
@app.cell
def _():
import numpy as np # Import only blocks, just duplicated
import pandas as pd
@app.cell
def _():
import jax With manual editing overriding the top level imports over the cell level (and last cell creation if no corresponding block exists) |
I think the simplest solution for pure top-level functions proposed so far is to (1) serialize pure functions as The only downside I see is the footgun of accidentally making functions impure, as Patrick brought up, but perhaps we could address this in the editor user interface, as @dmadisetti suggested in a previous comment. @leventov's suggestion of enabling normal development is interesting. In fact Pluto.jl has a I think there are two issues that have been brought up in this discussion. (1) When a user writes a cell containing only a pure function in the marimo editor, how should we serialize it as a "top-level function" in the notebook file, so it can be easily used by other Python modules? That is the topic of this GitHub issue. |
I would add: Doing AI evals or a la https://www.braintrust.dev/docs/guides/playground would be a breeze in Marimo if caching was optionally supported for top-level functions. I'm now much less sure about the "XTDB" suggestion that I made in the first message of #3176. On the other hand, the arguments to these functions could be Pickled and be thought of as "cell inputs" for |
## 📝 Summary Adds top level functions - `@app.fn` (but maybe `app.function` is better) to wrap functions, register them but leave them directly exposed - Serialize to `@app.fn` if there are no dependencies Just an intro PR for #2293 For completion of 2293, I think out standing changes are: - [ ] some UI aspect (maybe a new language toggle?) - [ ] Top level imports - [ ] Looser requirements for serialization (can use imports and other app.fns) But this is hidden under `App(_toplevel_fn=True)` ## 🔍 Description of Changes see the smoke test or: ```python @app.fn def self_ref_fib(n: int) -> int: if n == 0: return 0 if n == 1: return 1 return self_ref_fib(n - 1) + self_ref_fib(n - 2) @app.fn(disabled=True, hide_code=True) def divide(x, y): return y / x @app.fn def subtraction(a: "int", b: "int") -> "int": return a - b @app.fn # Comments inbetween def multiply(a, b) -> "int": return a * b @app.fn def addition(a: int, b: int) -> int: # int is considered no good, re-eval return a + b ``` <!-- Detail the specific changes made in this pull request. Explain the problem addressed and how it was resolved. If applicable, provide before and after comparisons, screenshots, or any relevant details to help reviewers understand the changes easily. --> ## 📋 Checklist - [ ] I have read the [contributor guidelines](https://github.com/marimo-team/marimo/blob/main/CONTRIBUTING.md). - [ ] For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on [Discord](https://marimo.io/discord?ref=pr), or the community [discussions](https://github.com/marimo-team/marimo/discussions) (Please provide a link if applicable). - [ ] I have added tests for the changes made. - [ ] I have run the code and verified that it works as expected. ## 📜 Reviewers <!-- Tag potential reviewers from the community or maintainers who might be interested in reviewing this pull request. Your PR will be reviewed more quickly if you can figure out the right person to tag with @ --> @akshayka OR @mscolnick --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
## 📝 Summary Followup to #3755 for #2293 allowing for "top level imports" For completion of #2293, I thin UI changes and needed for enabling this behavior. Notably: - Indicate when in function mode (maybe top level import too) - Provide hints when pushed out of function mode - Maybe allow the user to opt out of function mode? \+ docs This also increases security risk since code is run outside of runtime. This was always possible, but now marimo can save in a format that could skip the marimo runtime all together on restart. There are opportunities here. marimo could lean into this, and leverage external code running as a chance to hook in (almost a plugin system for free) But also issues, since a missing dep could stop the notebook from running at all (goes against the "batteries included" ethos). This can be mitigated with static analysis over just an import (markdown does this for instance), or marimo can re-serialize the notebook in the "safe" form, if it comes across issues in import. ## 🔍 Description of Changes Includes a bit of a refactor to codegen since there were a fair amount of changes. Allows top level imports of "import only" cells. The contents are pasted at the top of the file, with a bit of care not to break header extraction. ```python # Normal headers are retained # Use a notice to denote where generated imports start # Notice maybe needs some copy edit # 👋 This file was generated by marimo. You can edit it, and tweak # things- just be conscious that some changes may be overwritten if opened in # the editor. For instance top level imports are derived from a cell, and not # the top of the script. This notice signifies the beginning of the generated # import section. # Could also make this app.imports? But maybe increasing surface area for no reason import numpy # Note, import cells intentionally do not have a `return` # for static analysis feature below import marimo __generated_with = "0.11.2" app = marimo.App(_toplevel_fn=True) @app.cell def import_cell(): # Could also make this app.imports? But maybe increasing surface area for no reason import numpy # Note, import cells intentionally do not have a `return` # for static analysis feature below ``` Top level refs (this includes `@app.function`s) are ignored in the signatures. E.g. ```python import marimo as mo # ... @app.cell def md_cell(): mo.md("Hi") return ``` Since I was also in there, I added static analysis to ignore returning dangling defs. ```python @app.cell def cell_with_dangling_def(): a = 1 b = 2 return (a,) # No longer returns b since it's not used anywhere. Allowing for linters like ruff to complain. @app.cell def ref_cell(a): a + 1 return ``` LMK if too far reaching and we can break it up/ refactor. A bit more opinionated than the last PR Test border more on being more smoke tests than unit tests, but hit the key issues I was worried about. I can break them down more granularly if needed. Also LMK if you can think of some more edgecases. ## 📜 Reviewers <!-- Tag potential reviewers from the community or maintainers who might be interested in reviewing this pull request. Your PR will be reviewed more quickly if you can figure out the right person to tag with @ --> @akshayka OR @mscolnick --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
This issue is for making it possible to import functions directly from marimo notebooks:
Only "pure" functions will be importable, ie functions that have no refs besides imported modules. Some care will need to be taken to handle modules.
A cell that includes a single function def, and nothing more, will be saved in the notebook file as:
Discussed on:
The text was updated successfully, but these errors were encountered: