[Alpha] - Python native-typing support (second installment)
Pre-releaseRelease notes 0.16.0a1
Changes
Since the last alpha release...
Multi-image support
Many users have complained that it doesn’t make sense to force all tasks to use the same image, especially when workflows can span spark tasks and GPU-based ML tasks. For instance if you publish a separate spark image with the tag spark-<sha>
, you can get a task to use it instead of the default by adding the following to the task decorator.
container_image='{{.image.default.fqn}}:spark-{{.image.default.version}}'
Branches
Branching logic has been fully implemented and a minor bug patched in flytepropeller, so you can now call conditionals in your workflow. See the cookbook for an example.
Tasks
Hive tasks were ported to this new flytekit API and backend plugin changes made to increase the customizability of Hive queries. Please see the issue for the full discussion of the changes. Sidecar tasks have also been ported over. Please see the examples. The ability to specify task resources has also been added.
Spark
Spark is a supported task type. This example provides more details about how to use spark in flytekit. In spark-enabled clusters, simply decorate a function like hello_spark
with
from flytekit.taskplugins.spark import Spark
# this configuration is applied to the spark cluster
@task(spark_conf={
...
})
def hello_spark(partitions: int) -> float:
...
Within this function users can access the Spark context as follows
session = flytekit.current_context().spark_session
Spark Tasks can also simply return pyspark.DataFrame
as a valid supported returned object for FlyteSchema
types. This only works when FlyteSchema
is declared as a return type. Refer to this example for more.
Dataclasses
Often times flytekit users want to use custom data objects in their classes and tasks. Flyte supports both dictionaries and json natively, but there is no simple interface to return this information. Thus, if users want to use custom data-classes, they can now use python dataclasses in their tasks and workflows. Refer to this example for more information.
Minor changes
When returning files in tasks, the filename now gets preserved instead of becoming a random string of characters. Upon download of an incoming file, the filename is also preserved. Better handling around empty dicts and functions with no outputs.
Cookbook
Some of the links above lead to the new cookbook ("2nd edition") we've been working on. Please keep in mind that content on there isn't complete - we'll be pushing to it frequently. Documentation is always a work in progress. Please let us know if you like or dislike the new format. We are moving to it because the plugins used enable literate programming, and we feel that style is especially well suited for this repo, but let us know if you disagree.
Setup
Please continue to refer to the instructions from the initial alpha0 release for the iteration steps.
As always, please give us feedback on this release, or anything else Flyte in the open source Slack (#flytekit or any other channel).
Thanks!