Releasing 2.12.0

EnricoMi · EnricoMi · commit bfb29c9bf13f · 2024-04-26T20:20:26.000+02:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,7 @@ All notable changes to this project will be documented in this file.
 
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
 
-## [UNRELEASED] - YYYY-MM-DD
+## [2.12.0] - 2024-04-26
 
 ## Fixes
 
diff --git a/README.md b/README.md
@@ -198,7 +198,7 @@ The package version has the following semantics: `spark-extension_{SCALA_COMPAT_
 Add this line to your `build.sbt` file:
 
 ```sbt
-libraryDependencies += "uk.co.gresearch.spark" %% "spark-extension" % "2.11.0-3.5"
+libraryDependencies += "uk.co.gresearch.spark" %% "spark-extension" % "2.12.0-3.5"
 ```
 
 ### Maven
@@ -209,7 +209,7 @@ Add this dependency to your `pom.xml` file:
 <dependency>
   <groupId>uk.co.gresearch.spark</groupId>
   <artifactId>spark-extension_2.12</artifactId>
-  <version>2.11.0-3.5</version>
+  <version>2.12.0-3.5</version>
 </dependency>
 ```
 
@@ -219,7 +219,7 @@ Add this dependency to your `build.gradle` file:
 
 ```groovy
 dependencies {
-    implementation "uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5"
+    implementation "uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5"
 }
 ```
 
@@ -228,7 +228,7 @@ dependencies {
 Submit your Spark app with the Spark Extension dependency (version ≥1.1.0) as follows:
 
 ```shell script
-spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5 [jar]
+spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5 [jar]
 ```
 
 Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your Spark version.
@@ -238,7 +238,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depe
 Launch a Spark Shell with the Spark Extension dependency (version ≥1.1.0) as follows:
 
 ```shell script
-spark-shell --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5
+spark-shell --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5
 ```
 
 Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your Spark Shell version.
@@ -254,7 +254,7 @@ from pyspark.sql import SparkSession
 
 spark = SparkSession \
     .builder \
-    .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5") \
+    .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5") \
     .getOrCreate()
 ```
 
@@ -265,7 +265,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depe
 Launch the Python Spark REPL with the Spark Extension dependency (version ≥1.1.0) as follows:
 
 ```shell script
-pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5
+pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5
 ```
 
 Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your PySpark version.
@@ -275,7 +275,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depe
 Run your Python scripts that use PySpark via `spark-submit`:
 
 ```shell script
-spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5 [script.py]
+spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5 [script.py]
 ```
 
 Note: Pick the right Scala version (here 2.12) and Spark version (here 3.5) depending on your Spark version.
@@ -289,7 +289,7 @@ Running your Python application on a Spark cluster will still require one of the
 to add the Scala package to the Spark environment.
 
 ```shell script
-pip install pyspark-extension==2.11.0.3.5
+pip install pyspark-extension==2.12.0.3.5
 ```
 
 Note: Pick the right Spark version (here 3.5) depending on your PySpark version.
@@ -299,7 +299,7 @@ Note: Pick the right Spark version (here 3.5) depending on your PySpark version.
 There are plenty of [Data Science notebooks](https://datasciencenotebook.org/) around. To use this library,
 add **a jar dependency** to your notebook using these **Maven coordinates**:
 
-    uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.5
+    uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.5
 
 Or [download the jar](https://mvnrepository.com/artifact/uk.co.gresearch.spark/spark-extension) and place it
 on a filesystem where it is accessible by the notebook, and reference that jar file directly.
diff --git a/pom.xml b/pom.xml
@@ -2,7 +2,7 @@
   <modelVersion>4.0.0</modelVersion>
   <groupId>uk.co.gresearch.spark</groupId>
   <artifactId>spark-extension_2.13</artifactId>
-  <version>2.12.0-3.5-SNAPSHOT</version>
+  <version>2.12.0-3.5</version>
   <name>Spark Extension</name>
   <description>A library that provides useful extensions to Apache Spark.</description>
   <inceptionYear>2020</inceptionYear>
diff --git a/python/README.md b/python/README.md
@@ -2,20 +2,20 @@
 
 This project provides extensions to the [Apache Spark project](https://spark.apache.org/) in Scala and Python:
 
-**[Diff](https://github.com/G-Research/spark-extension/blob/v2.11.0/DIFF.md):** A `diff` transformation and application for `Dataset`s that computes the differences between
+**[Diff](https://github.com/G-Research/spark-extension/blob/v2.12.0/DIFF.md):** A `diff` transformation and application for `Dataset`s that computes the differences between
 two datasets, i.e. which rows to _add_, _delete_ or _change_ to get from one dataset to the other.
 
-**[Histogram](https://github.com/G-Research/spark-extension/blob/v2.11.0/HISTOGRAM.md):** A `histogram` transformation that computes the histogram DataFrame for a value column.
+**[Histogram](https://github.com/G-Research/spark-extension/blob/v2.12.0/HISTOGRAM.md):** A `histogram` transformation that computes the histogram DataFrame for a value column.
 
-**[Global Row Number](https://github.com/G-Research/spark-extension/blob/v2.11.0/ROW_NUMBER.md):** A `withRowNumbers` transformation that provides the global row number w.r.t.
+**[Global Row Number](https://github.com/G-Research/spark-extension/blob/v2.12.0/ROW_NUMBER.md):** A `withRowNumbers` transformation that provides the global row number w.r.t.
 the current order of the Dataset, or any given order. In contrast to the existing SQL function `row_number`, which
 requires a window spec, this transformation provides the row number across the entire Dataset without scaling problems.
 
-**[Inspect Parquet files](https://github.com/G-Research/spark-extension/blob/v2.11.0/PARQUET.md):** The structure of Parquet files (the metadata, not the data stored in Parquet) can be inspected similar to [parquet-tools](https://pypi.org/project/parquet-tools/)
+**[Inspect Parquet files](https://github.com/G-Research/spark-extension/blob/v2.12.0/PARQUET.md):** The structure of Parquet files (the metadata, not the data stored in Parquet) can be inspected similar to [parquet-tools](https://pypi.org/project/parquet-tools/)
 or [parquet-cli](https://pypi.org/project/parquet-cli/) by reading from a simple Spark data source.
 This simplifies identifying why some Parquet files cannot be split by Spark into scalable partitions.
 
-**[Install Python packages into PySpark job](https://github.com/G-Research/spark-extension/blob/v2.11.0/PYSPARK-DEPS.md):** Install Python dependencies via PIP or Poetry programatically into your running PySpark job (PySpark ≥ 3.1.0):
+**[Install Python packages into PySpark job](https://github.com/G-Research/spark-extension/blob/v2.12.0/PYSPARK-DEPS.md):** Install Python dependencies via PIP or Poetry programatically into your running PySpark job (PySpark ≥ 3.1.0):
 
 ```python
 # noinspection PyUnresolvedReferences
@@ -94,7 +94,7 @@ Running your Python application on a Spark cluster will still require one of the
 to add the Scala package to the Spark environment.
 
 ```shell script
-pip install pyspark-extension==2.11.0.3.4
+pip install pyspark-extension==2.12.0.3.4
 ```
 
 Note: Pick the right Spark version (here 3.4) depending on your PySpark version.
@@ -108,7 +108,7 @@ from pyspark.sql import SparkSession
 
 spark = SparkSession \
     .builder \
-    .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4") \
+    .config("spark.jars.packages", "uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4") \
     .getOrCreate()
 ```
 
@@ -119,7 +119,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depe
 Launch the Python Spark REPL with the Spark Extension dependency (version ≥1.1.0) as follows:
 
 ```shell script
-pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4
+pyspark --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4
 ```
 
 Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depending on your PySpark version.
@@ -129,7 +129,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depe
 Run your Python scripts that use PySpark via `spark-submit`:
 
 ```shell script
-spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4 [script.py]
+spark-submit --packages uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4 [script.py]
 ```
 
 Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depending on your Spark version.
@@ -139,7 +139,7 @@ Note: Pick the right Scala version (here 2.12) and Spark version (here 3.4) depe
 There are plenty of [Data Science notebooks](https://datasciencenotebook.org/) around. To use this library,
 add **a jar dependency** to your notebook using these **Maven coordinates**:
 
-    uk.co.gresearch.spark:spark-extension_2.12:2.11.0-3.4
+    uk.co.gresearch.spark:spark-extension_2.12:2.12.0-3.4
 
 Or [download the jar](https://mvnrepository.com/artifact/uk.co.gresearch.spark/spark-extension) and place it
 on a filesystem where it is accessible by the notebook, and reference that jar file directly.
diff --git a/python/setup.py b/python/setup.py
@@ -17,7 +17,7 @@
 from pathlib import Path
 from setuptools import setup
 
-jar_version = '2.12.0-3.5-SNAPSHOT'
+jar_version = '2.12.0-3.5'
 scala_version = '2.13.8'
 scala_compat_version = '.'.join(scala_version.split('.')[:2])
 spark_compat_version = jar_version.split('-')[1]