Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the Guided Onboarding #1936

Closed
anna-geller opened this issue Aug 23, 2023 · 7 comments · Fixed by #1940
Closed

Enhance the Guided Onboarding #1936

anna-geller opened this issue Aug 23, 2023 · 7 comments · Fixed by #1940
Assignees

Comments

@anna-geller
Copy link
Member

anna-geller commented Aug 23, 2023

Feature description

Some folks getting started with Kestra still don't know how to use Python in combination with other tasks.

It's worth changing the onboarding example to this one (or a similar one):

id: hello-world
namespace: dev

inputs:
  - name: user
    type: STRING
    defaults: Kestra user

tasks:
  - id: log
    type: io.kestra.core.tasks.log.Log
    message: Hey there, {{ inputs.user }}!

  - id: api
    type: io.kestra.plugin.fs.http.Request
    uri: https://dummyjson.com/products

  - id: python
    type: io.kestra.plugin.scripts.python.Script
    docker:
      image: python:slim
    beforeCommands:
      - pip install polars
    warningOnStdErr: false
    script: |
      import polars as pl
      data = {{outputs.api.body | jq('.products') | first}}
      df = pl.from_dicts(data)
      df.glimpse()
      df.select(["brand", "price"]).write_csv("{{outputDir}}/products.csv")

  - id: sqlQuery
    type: io.kestra.plugin.jdbc.duckdb.Query
    inputFiles:
      in.csv: "{{ outputs.python.outputFiles['products.csv'] }}"
    sql: |
      SELECT brand, round(avg(price), 2) as avg_price
      FROM read_csv_auto('{{workingDir}}/in.csv', header=True)
      GROUP BY brand
      ORDER BY avg_price DESC;
    store: true  

triggers:
  - id: everyMinute
    type: io.kestra.core.models.triggers.types.Schedule
    cron: "*/1 * * * *"

Why this example?

  1. It shows common tasks: calling REST APIs, running Python scripts and SQL queries
  2. Running Python scripts and installing custom dependencies -- this example is lightweight as it only installs Polars which is only ~20 MB library
  3. Passing data between tasks using outputs
  4. Using outputDir and workingDir
  5. Showing how to use jq() function to manipulate outputs to get the data you want
  6. Showing outputs and the Preview feature -- the final DuckDB task will store a beautifully formatted markdown table when clicking on the preview
  7. Showing how standard tasks (Request task) can work together with Script and Query tasks

It would also be great to point to Blueprints at the end of the Onboarding Guide to show how people can continue with Kestra after running this first flow

@anna-geller
Copy link
Member Author

it might be worth doing this after the new Editor work to showcase first how to do it this way (inline) and then how to refactor it to use the Python and Query tasks with these scripts stored as Namespace Files

@tchiotludo
Copy link
Member

we could do on that release, it's quick to do.
Just a suggestion, I would keep pandas that is more known instead of polars

@Ben8t
Copy link
Member

Ben8t commented Aug 23, 2023

+1
just a little fear about the inital docker pull : users would need to wait almost a minute before seeing any logs 🤔

@anna-geller
Copy link
Member Author

anna-geller commented Aug 23, 2023

we can keep the hello task + the trigger

FWIW should be max. 20s for the slim image, no?

EDIT: updated the flow in the issue to use the inputs, show first logs quickly + schedule

@Ben8t
Copy link
Member

Ben8t commented Aug 23, 2023

I think like this it's a good compromise, indeed slim image shouldn't take that long

@tchiotludo
Copy link
Member

tchiotludo commented Aug 23, 2023

In fact there is some impact, we append description during the guided tour, here is the 2 files:

Can you send a PR please with that helper text please?

@anna-geller
Copy link
Member Author

Will do tomorrow, thanks for the pointer of where descriptions are defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants