Skip to content

Commit

Permalink
merge from main (this time will work)
Browse files Browse the repository at this point in the history
  • Loading branch information
lucianosrp committed Oct 6, 2024
2 parents 50b9db6 + 3e0405d commit 96b5d60
Show file tree
Hide file tree
Showing 86 changed files with 2,186 additions and 653 deletions.
46 changes: 46 additions & 0 deletions .github/workflows/downstream_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,49 @@ jobs:
run: |
cd scikit-lego
pytest -n auto --disable-warnings --cov=sklego -m "not cvxpy and not formulaic and not umap"
shiny:
strategy:
matrix:
python-version: ["3.12"]
os: [ubuntu-latest]

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: "true"
cache-suffix: ${{ matrix.python-version }}
cache-dependency-glob: "**requirements*.txt"
- name: clone-shiny
run: |
git clone https://github.com/posit-dev/py-shiny.git
cd py-shiny
git log
- name: install-basics
run: uv pip install --upgrade tox virtualenv setuptools --system
- name: install-shiny-dev
run: |
cd py-shiny
uv pip install -e ".[dev,test]" --system
- name: install-narwhals-dev
run: |
uv pip uninstall narwhals --system
uv pip install -e . --system
- name: show-deps
run: uv pip freeze
- name: Run pytest
run: |
cd py-shiny
python tests/pytest/asyncio_prevent.py
pytest
- name: Run mypy
run: |
cd py-shiny
uv pip install mypy --system
mypy shiny
6 changes: 4 additions & 2 deletions .github/workflows/extremes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,8 @@ jobs:
kaggle kernels output "marcogorelli/variable-brink-glacier"
- name: install-polars
run: python -m pip install *.whl
- name: install-pandas-nightly
run: pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple pandas
- name: install-reqs
run: uv pip install --upgrade tox virtualenv setuptools pip -r requirements-dev.txt --system
- name: uninstall pyarrow
Expand All @@ -127,8 +129,8 @@ jobs:
# run: uv pip install --extra-index-url https://pypi.fury.io/arrow-nightlies/ --pre pyarrow --system
- name: uninstall pandas
run: uv pip uninstall pandas --system
- name: install-pandas-nightly
run: uv pip install --prerelease=allow --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple pandas --system
- name: show-deps
run: uv pip freeze
- name: uninstall numpy
run: uv pip uninstall numpy --system
- name: install numpy nightly
Expand Down
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,8 @@ repos:
args: [--skip-errors]
additional_dependencies:
- black==22.12.0
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: name-tests-test
exclude: ^tests/utils\.py
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,12 @@
Extremely lightweight and extensible compatibility layer between dataframe libraries!

- **Full API support**: cuDF, Modin, pandas, Polars, PyArrow
- **Lazy-only support**: Dask
- **Interchange-level support**: Ibis, Vaex, anything else which implements the DataFrame Interchange Protocol

Seamlessly support all, without depending on any!

-**Just use** a subset of **the Polars API**, no need to learn anything new
-**Just use** [a subset of **the Polars API**](https://narwhals-dev.github.io/narwhals/api-reference/), no need to learn anything new
-**Zero dependencies**, Narwhals only uses what
the user passes in so your library can stay lightweight
- ✅ Separate **lazy** and eager APIs, use **expressions**
Expand Down Expand Up @@ -117,6 +118,9 @@ Narwhals has been featured in several talks, podcasts, and blog posts:
- [Talk Python to me Podcast](https://youtu.be/FSH7BZ0tuE0)
Ahoy, Narwhals are bridging the data science APIs

- [Python Bytes Podcast](https://www.youtube.com/live/N7w_ESVW40I?si=y-wN1uCsAuJOKlOT&t=382)
Episode 402, topic #2

- [Super Data Science: ML & AI Podcast](https://www.youtube.com/watch?v=TeG4U8R0U8U)
Narwhals: For Pandas-to-Polars DataFrame Compatibility

Expand Down
1 change: 1 addition & 0 deletions docs/api-reference/narwhals.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Here are the top-level functions available in Narwhals.
- maybe_align_index
- maybe_convert_dtypes
- maybe_get_index
- maybe_reset_index
- maybe_set_index
- mean
- mean_horizontal
Expand Down
1 change: 1 addition & 0 deletions docs/api-reference/series_str.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
- slice
- starts_with
- strip_chars
- to_datetime
- tail
show_source: false
show_bases: false
6 changes: 6 additions & 0 deletions docs/assets/logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/backcompat.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ users of `narwhals.stable.v1` will have their code unaffected.
Which should you use? In general we recommend:

- When prototyping, use `import narwhals as nw`, so you can iterate quickly.
- Once you're happy with what you've got and what to release something production-ready and stable,
when switch out your `import narwhals as nw` usage for `import narwhals.stable.v1 as nw`.
- Once you're happy with what you've got and want to release something production-ready and stable,
then switch out your `import narwhals as nw` usage for `import narwhals.stable.v1 as nw`.

## Exceptions

Expand Down
81 changes: 41 additions & 40 deletions docs/extending.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## List of supported libraries (and how to add yours!)

Currently, Narwhals supports the following libraries as inputs:
Currently, Narwhals has full API support for the following libraries:

| Library | 🔗 Link 🔗 |
| ------------- | ------------- |
Expand All @@ -12,46 +12,13 @@ Currently, Narwhals supports the following libraries as inputs:
| Modin | [github.com/modin-project/modin](https://github.com/modin-project/modin) |
| PyArrow ⇶ | [arrow.apache.org/docs/python](https://arrow.apache.org/docs/python/index.html) |

If you want your own library to be recognised too, you're welcome open a PR (with tests)!
Alternatively, if you can't do that (for example, if you library is closed-source), see
the next section for what else you can do.

To check which methods are supported for which backend in depth, please refer to the
[API completeness page](api-completeness/index.md).

## Extending Narwhals

We love open source, but we're not "open source absolutists". If you're unable to open
source you library, then this is how you can make your library compatible with Narwhals.

Make sure that, in addition to the public Narwhals API, you also define:

- `DataFrame.__narwhals_dataframe__`: return an object which implements public methods
from `Narwhals.DataFrame`
- `DataFrame.__narwhals_namespace__`: return an object which implements public top-level
functions from `narwhals` (e.g. `narwhals.col`, `narwhals.concat`, ...)
- `DataFrame.__native_namespace__`: return a native namespace object which must have a
`from_dict` method
- `LazyFrame.__narwhals_lazyframe__`: return an object which implements public methods
from `Narwhals.LazyFrame`
- `LazyFrame.__narwhals_namespace__`: return an object which implements public top-level
functions from `narwhals` (e.g. `narwhals.col`, `narwhals.concat`, ...)
- `LazyFrame.__native_namespace__`: return a native namespace object which must have a
`from_dict` method
- `Series.__narwhals_series__`: return an object which implements public methods
from `Narwhals.Series`

If your library doesn't distinguish between lazy and eager, then it's OK for your dataframe
object to implement both `__narwhals_dataframe__` and `__narwhals_lazyframe__`. In fact,
that's currently what `narwhals._pandas_like.dataframe.PandasLikeDataFrame` does. So, if you're stuck,
take a look at the source code to see how it's done!

Note that the "extension" mechanism is still experimental. If anything is not clear, or
doesn't work, please do raise an issue or contact us on Discord (see the link on the README).
It also has lazy-only support for [Dask](https://github.com/dask/dask), and interchange-only support
for [DuckDB](https://github.com/duckdb/duckdb) and [Ibis](https://github.com/ibis-project/ibis).

## Levels
### Levels

Narwhals comes with two levels of support: "full" and "interchange".
Narwhals comes with two levels of support ("full" and "interchange"), and we are working on defining
a "lazy-only" level too.

Libraries for which we have full support can benefit from the whole
[Narwhals API](https://narwhals-dev.github.io/narwhals/api-reference/).
Expand Down Expand Up @@ -91,4 +58,38 @@ def func(df: Any) -> Schema:
return df.schema
```
is also supported, meaning that, in addition to the libraries mentioned above, you can
also pass Ibis, Vaex, PyArrow, and any other library which implements the protocol.
also pass Ibis, DuckDB, Vaex, and any library which implements the protocol.

### Extending Narwhals

If you want your own library to be recognised too, you're welcome open a PR (with tests)!.
Alternatively, if you can't do that (for example, if you library is closed-source), see
the next section for what else you can do.

We love open source, but we're not "open source absolutists". If you're unable to open
source you library, then this is how you can make your library compatible with Narwhals.

Make sure that, in addition to the public Narwhals API, you also define:

- `DataFrame.__narwhals_dataframe__`: return an object which implements public methods
from `Narwhals.DataFrame`
- `DataFrame.__narwhals_namespace__`: return an object which implements public top-level
functions from `narwhals` (e.g. `narwhals.col`, `narwhals.concat`, ...)
- `DataFrame.__native_namespace__`: return a native namespace object which must have a
`from_dict` method
- `LazyFrame.__narwhals_lazyframe__`: return an object which implements public methods
from `Narwhals.LazyFrame`
- `LazyFrame.__narwhals_namespace__`: return an object which implements public top-level
functions from `narwhals` (e.g. `narwhals.col`, `narwhals.concat`, ...)
- `LazyFrame.__native_namespace__`: return a native namespace object which must have a
`from_dict` method
- `Series.__narwhals_series__`: return an object which implements public methods
from `Narwhals.Series`

If your library doesn't distinguish between lazy and eager, then it's OK for your dataframe
object to implement both `__narwhals_dataframe__` and `__narwhals_lazyframe__`. In fact,
that's currently what `narwhals._pandas_like.dataframe.PandasLikeDataFrame` does. So, if you're stuck,
take a look at the source code to see how it's done!

Note that this "extension" mechanism is still experimental. If anything is not clear, or
doesn't work, please do raise an issue or contact us on Discord (see the link on the README).
5 changes: 5 additions & 0 deletions docs/how_it_works.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ from narwhals.utils import parse_version
pn = PandasLikeNamespace(
implementation=Implementation.PANDAS,
backend_version=parse_version(pd.__version__),
dtypes=nw.dtypes,
)
print(nw.col("a")._call(pn))
```
Expand All @@ -101,13 +102,15 @@ import pandas as pd
pn = PandasLikeNamespace(
implementation=Implementation.PANDAS,
backend_version=parse_version(pd.__version__),
dtypes=nw.dtypes,
)

df_pd = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df = PandasLikeDataFrame(
df_pd,
implementation=Implementation.PANDAS,
backend_version=parse_version(pd.__version__),
dtypes=nw.dtypes,
)
expression = pn.col("a") + 1
result = expression._call(df)
Expand Down Expand Up @@ -196,6 +199,7 @@ import pandas as pd
pn = PandasLikeNamespace(
implementation=Implementation.PANDAS,
backend_version=parse_version(pd.__version__),
dtypes=nw.dtypes,
)

df_pd = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
Expand All @@ -210,6 +214,7 @@ backend, and it does so by passing a Narwhals-compliant namespace to `nw.Expr._c
pn = PandasLikeNamespace(
implementation=Implementation.PANDAS,
backend_version=parse_version(pd.__version__),
dtypes=nw.dtypes,
)
expr = (nw.col("a") + 1)._call(pn)
print(expr)
Expand Down
10 changes: 7 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,15 @@

![](assets/image.png)

Extremely lightweight compatibility layer between Polars, pandas, and more.
Extremely lightweight and extensible compatibility layer between dataframe libraries!

Seamlessly support both, without depending on either!
- **Full API support**: cuDF, Modin, pandas, Polars, PyArrow
- **Lazy-only support**: Dask
- **Interchange-level support**: Ibis, Vaex, anything else which implements the DataFrame Interchange Protocol

-**Just use** a subset of **the Polars API**, no need to learn anything new
Seamlessly support all, without depending on any!

-**Just use** [a subset of **the Polars API**](https://narwhals-dev.github.io/narwhals/api-reference/), no need to learn anything new
-**Zero dependencies**, Narwhals only uses what
the user passes in so your library can stay lightweight
- ✅ Separate **lazy** and eager APIs, use **expressions**
Expand Down
4 changes: 2 additions & 2 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Then, if you start the Python REPL and see the following:
```python
>>> import narwhals
>>> narwhals.__version__
'1.8.4'
'1.9.1'
```
then installation worked correctly!

Expand Down Expand Up @@ -69,4 +69,4 @@ If you run `python t.py` then your output should look like the above. This is th
function - as we'll soon see, we can do much more advanced things.
Let's learn about what you just did, and what Narwhals can do for you!

Note: these examples are only using pandas and Polars. Please see the following to find the [supported libriaries](extending.md).
Note: these examples are only using pandas and Polars. Please see the following to find the [supported libraries](extending.md).
33 changes: 33 additions & 0 deletions docs/roadmap_and_related.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,36 @@ Array counterpart to the DataFrame API, see [here](https://data-apis.org/array-a
Allows C extension modules to safely share pointers to C data structures with Python code and other C modules, encapsulating the pointer with a name and optional destructor to manage resources and ensure safe access, see [here](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html) for details.

Narwhals supports exporting a DataFrame via the Arrow PyCapsule Interface.

### Ibis

Pitched as "The portable Dataframe library", Ibis provides a Pythonic frontend
to various SQL (as well as Polars LazyFrame) engines. Some differences with Narwhals are:

- Narwhals' main use case is for library maintainers wanting to support
different dataframe libraries without depending on any whilst keeping
things as lightweight as possible. Ibis is more targeted at end users
and aims to be thought of as a Dataframe library akin to
pandas / Polars / etc.
- Narwhals allows you to write a "Dataframe X in, Dataframe X out" function.
Ibis allows materialising to pandas, Polars (eager), and PyArrow, but has
no way to get back to the input type exactly (e.g. there's no way to
start with a Polars LazyFrame and get back a Polars LazyFrame)
- Narwhals respects input data types as much as possible, Ibis doesn't
support Categorical (nor does it distinguish between fixed-size-list and
list)
- Narwhals separates between lazy and eager APIs, with the eager API
provide very fine control over dataframe operations (slicing rows and
columns, iterating over rows, getting values out of the dataframe as
Python scalars). Ibis is more focused on lazy execution
- Ibis supports SQL engines (and can translate to SQL),
Narwhals is more focused traditional dataframes where row-order is defined
(although we are brainstorming a lazy-only level of support)
- Narwhals is extremely lightweight and comes with zero required dependencies,
Ibis requires pandas and PyArrow for all backends
- Narwhals supports Dask, whereas Ibis has deprecated support for it

Although people often ask about the two tools, we consider them to be
very different and not in competition. Further efforts to clarify the
distinction are welcome 🙏!

18 changes: 11 additions & 7 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,12 @@ nav:
- extending.md
- how_it_works.md
- Roadmap and related projects: roadmap_and_related.md
- API Completeness:
- api-completeness/index.md
- Supported DataFrame methods: api-completeness/dataframe.md
- Supporteda Expr methods: api-completeness/expr.md
- Supported Series methods: api-completeness/series.md
# Commented-out until https://github.com/narwhals-dev/narwhals/issues/1004 is addressed
# - API Completeness:
# - api-completeness/index.md
# - Supported DataFrame methods: api-completeness/dataframe.md
# - Supported Expr methods: api-completeness/expr.md
# - Supported Series methods: api-completeness/series.md
- API Reference:
- api-reference/narwhals.md
- api-reference/dataframe.md
Expand All @@ -47,8 +48,8 @@ nav:
theme:
name: material
font: false
favicon: assets/image.png
logo: assets/image.png
favicon: assets/logo.svg
logo: assets/logo.svg
features:
- content.code.copy
- content.code.annotate
Expand All @@ -75,6 +76,9 @@ theme:
toggle:
icon: material/brightness-4
name: Switch to system preference
extra_css:
- https://unpkg.com/katex@0/dist/katex.min.css
- css/mkdocstrings.css

plugins:
- search
Expand Down
Loading

0 comments on commit 96b5d60

Please sign in to comment.