Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-events: improve Productionizing Jupyter Notebooks README #2896

Merged
69 changes: 69 additions & 0 deletions events/productionizing-jupyter-notebooks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Productionizing Jupyter Notebooks with Versatile Data Kit (VDK)

## Table of Contents
- [Goals](#Goals)
- [Agenda](#Agenda)
- [Purpose](#Purpose)
- [Background](#Background)
* [Objective](#Objective)
- [Tutorial](#Tutorial)
- [Exercises](#Exercises)
- [Lessons Learned](#Lessons-Learned)
- [Feedback](#Feedback)
- [Where to Find Us](#Where-to-Find-Us)

## Presenter
[Duygu Hasan](https://github.com/duyguHsnHsn) <a href='https://www.linkedin.com/in/duygu-hasan/'><img src="https://img.shields.io/badge/LinkedIn-0077B5"></a>

## Goals
- Understand the challenges associated with transitioning notebooks to production.
- Introduce the VDK solutions to address these challenges.

## Agenda
- Discuss strategies for productionizing Jupyter Notebooks.
- Showcase hands-on examples of these challenges in the Jupyter UI.

## Purpose
The purpose of this scenario is to demonstrate how to operationalize Jupyter notebooks using the Versatile Data Kit (VDK) Jupyter integration. By the end of this guide, you'll understand how to:
* Create a data job with VDK within a Jupyter notebook.
* Write a data workflow in a notebook and make it ready to be put in a production environment.

## Background
### Objective:
All the following objectives will be executed within a Jupyter notebook:
1. **Retrieve Data:** - Extract data from the specified URL using pandas.
2. **Data Cleansing:** - Eliminate records associated with 'testuser'.
3. **Score Classification:** - Assign scores into predefined categories for clarity.
4. **Data Ingestion:** - Use VDK job_input to ingest the organized data.

### Versatile Data Kit Jupyter Integration
For detailed instructions on working with VDK, please refer to the guide from the provided [link](../../projects/vdk-plugins/vdk-jupyter/getting-started.ipynb).

## Tutorial
### [LAUNCH TUTORIAL](https://mybinder.org/v2/gh/versatile-data-kit-demo/productionizing-jupyter-notebooks/HEAD?labpath=tutorial-job%2F10_notebook.ipynb)

## Exercises
The tutorial-job directory contains the ready-to-use code from this demo. Make sure to explore it as it will provide hands-on experience with the objectives and VDK Jupyter integration discussed in this guide.

## Lessons Learned
Throughout this scenario, you've:
* Explored the capabilities of the VDK Jupyter integration.
* Retrieved, cleaned, and processed data using Jupyter and VDK tools.
* Understood the process of ingesting data through VDK within a Jupyter environment.
* Understood the process of making notebooks ready for production with VDK.

**Congratulations!**

## Feedback
Please share your feedback :
[Productionizing Jupyter Notebooks Survey](https://forms.office.com/pages/responsepage.aspx?id=yjiRs-48Skuk1s2D2d1i8AGV0VaygrpPnt7Tz5bBbeBUNFA5NkU3QzlNWEQyUFJCTTQwRUszWk9GUS4u)

Give us a star if you've liked the [project](https://github.com/vmware/versatile-data-kit).

## Where to Find Us
- [YouTube](https://www.youtube.com/channel/UCasf2Q7X8nF7S4VEmcTHJ0Q/about)
- [Twitter](https://twitter.com/vdkproject)
- [GitHub](https://github.com/vmware/versatile-data-kit)
- Relevant Links
- [An Overview of Versatile Data Kit](https://towardsdatascience.com/an-overview-of-versatile-data-kit-a812cfb26de7)
- [Community meeting: Productionizing Jupyter Notebooks with Versatile Data Kit](https://www.youtube.com/watch?v=U6M6UzsoiqY)
Binary file not shown.

This file was deleted.