Skip to content

[Alpha] - Python native-typing support (second installment)

Pre-release
Pre-release
Compare
Choose a tag to compare
@wild-endeavor wild-endeavor released this 16 Dec 01:05
· 70 commits to annotations since this release
0cebfd0

Release notes 0.16.0a1

Changes

Since the last alpha release...

Multi-image support

Many users have complained that it doesn’t make sense to force all tasks to use the same image, especially when workflows can span spark tasks and GPU-based ML tasks. For instance if you publish a separate spark image with the tag spark-<sha>, you can get a task to use it instead of the default by adding the following to the task decorator.

container_image='{{.image.default.fqn}}:spark-{{.image.default.version}}'

Branches

Branching logic has been fully implemented and a minor bug patched in flytepropeller, so you can now call conditionals in your workflow. See the cookbook for an example.

Tasks

Hive tasks were ported to this new flytekit API and backend plugin changes made to increase the customizability of Hive queries. Please see the issue for the full discussion of the changes. Sidecar tasks have also been ported over. Please see the examples. The ability to specify task resources has also been added.

Spark

Spark is a supported task type. This example provides more details about how to use spark in flytekit. In spark-enabled clusters, simply decorate a function like hello_spark with

from flytekit.taskplugins.spark import Spark
   # this configuration is applied to the spark cluster
@task(spark_conf={
      ...
})
def hello_spark(partitions: int) -> float:
   ...

Within this function users can access the Spark context as follows

session = flytekit.current_context().spark_session

Spark Tasks can also simply return pyspark.DataFrame as a valid supported returned object for FlyteSchema types. This only works when FlyteSchema is declared as a return type. Refer to this example for more.

Dataclasses

Often times flytekit users want to use custom data objects in their classes and tasks. Flyte supports both dictionaries and json natively, but there is no simple interface to return this information. Thus, if users want to use custom data-classes, they can now use python dataclasses in their tasks and workflows. Refer to this example for more information.

Minor changes

When returning files in tasks, the filename now gets preserved instead of becoming a random string of characters. Upon download of an incoming file, the filename is also preserved. Better handling around empty dicts and functions with no outputs.

Cookbook

Some of the links above lead to the new cookbook ("2nd edition") we've been working on. Please keep in mind that content on there isn't complete - we'll be pushing to it frequently. Documentation is always a work in progress. Please let us know if you like or dislike the new format. We are moving to it because the plugins used enable literate programming, and we feel that style is especially well suited for this repo, but let us know if you disagree.

Setup

Please continue to refer to the instructions from the initial alpha0 release for the iteration steps.

As always, please give us feedback on this release, or anything else Flyte in the open source Slack (#flytekit or any other channel).

Thanks!