Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: increase PySpark min version to 3.5.0 #1744

Merged
merged 2 commits into from
Jan 7, 2025

Conversation

EdAbati
Copy link
Collaborator

@EdAbati EdAbati commented Jan 7, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Pros: make it easier to implement expressions that comes out-of-the-box with Pyspark 3.5
Cons: may hurt adoption since we only support the latest pyspark version (but pyspark 3.5.0 is already 1+ year old)

I think it is fine to focus on API coverage first and worry about supporting older versions later

@EdAbati EdAbati changed the title chore: increase pyspark min version chore: increase PySpark min version to 3.5.0 Jan 7, 2025
@@ -99,7 +99,7 @@ jobs:
cache-suffix: ${{ matrix.python-version }}
cache-dependency-glob: "pyproject.toml"
- name: install-not-so-old-versions
run: uv pip install tox virtualenv setuptools pandas==2.0.3 polars==0.20.8 numpy==1.24.4 pyarrow==15.0.0 "pyarrow-stubs<17" pyspark==3.4.0 scipy==1.8.0 scikit-learn==1.3.0 dask[dataframe]==2024.10 tzdata --system
run: uv pip install tox virtualenv setuptools pandas==2.0.3 polars==0.20.8 numpy==1.24.4 pyarrow==15.0.0 "pyarrow-stubs<17" pyspark==3.5.0 scipy==1.8.0 scikit-learn==1.3.0 dask[dataframe]==2024.10 tzdata --system
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a bonus we can also test pyspark 3.5.0 with numpy <2.0 without adding a new step (useful for std and var)

@EdAbati EdAbati marked this pull request as ready for review January 7, 2025 07:19
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @EdAbati !

separately, we should also add this minimum to MINIMUM_VERSIONS in narwhals/utils.py

@MarcoGorelli MarcoGorelli added the dependencies Pull requests that update a dependency file label Jan 7, 2025
@MarcoGorelli MarcoGorelli merged commit 3e42edd into narwhals-dev:main Jan 7, 2025
24 checks passed
Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite satisfying diff! Thanks @EdAbati !
Could you also address the MIN_VERSIONS dictionary in narwhals/utils.py?

Edit: just read @MarcoGorelli comment πŸ˜‚

@EdAbati EdAbati deleted the raise-min-pyspark-version branch January 7, 2025 22:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants