Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README for standalone-datacatalog #156

Merged
merged 3 commits into from
Oct 11, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 39 additions & 1 deletion standalone-datacatalog/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,41 @@
# The `standalone-datacatalog` Kedro starter

For more information, see the [Kedro documentation about this starter](https://docs.kedro.org/en/stable/notebooks_and_ipython/kedro_as_a_data_registry.html).
This starter, formerly known as `mini-kedro`, sets up a lightweight Kedro project that uses the Kedro [Data Catalog](https://docs.kedro.org/en/stable/data/index.html) as a registry for data without using any of the other features of Kedro.

The starter comprises a minimal setup to use the traditional [Iris dataset](https://www.kaggle.com/uciml/iris).

## Usage

To create a new project based on this starter:

```bash
kedro new --starter=standalone-datacatalog
```

You can call the project any name you choose. When created, the project contains the following:

* A `conf` directory, which contains an example `DataCatalog` configuration (`catalog.yml`):

```yaml
# conf/base/catalog.yml
example_dataset_1:
type: pandas.CSVDataset
filepath: folder/filepath.csv

example_dataset_2:
type: spark.SparkDataset
filepath: s3a://your_bucket/data/01_raw/example_dataset_2*
credentials: dev_s3
file_format: csv
save_args:
if_exists: replace
```

* A `data` directory, which contains an example dataset identical to the one used by the [`pandas-iris`](https://github.com/kedro-org/kedro-starters/tree/main/pandas-iris) starter

* An example Jupyter notebook, which shows how to instantiate the `DataCatalog` and interact with the example dataset:

```python
df = catalog.load("example_dataset_1")
df_2 = catalog.save("example_dataset_2")
```