Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[kobotoolbox] Download Kobo form metadata #29

Open
rudokemper opened this issue Dec 11, 2024 · 0 comments
Open

[kobotoolbox] Download Kobo form metadata #29

rudokemper opened this issue Dec 11, 2024 · 0 comments
Assignees
Labels
connectors Connector scripts for ETL from upstream data sources feature New specs for new behavior
Milestone

Comments

@rudokemper
Copy link
Member

rudokemper commented Dec 11, 2024

Feature Request

Let's consider fetching and storing the form metadata (or whatever KoboToolbox calls it) at https://kf.kobotoolbox.org/api/v2/assets/{formId}/ and storing it as a table with _metadata suffix.

User Stories & Acceptance Criteria

Here's an interesting Kobo use case to illustrate the value of us downloading the JSON returned by the {formId}/ API endpoint, but also introduces some complications when it comes to actually storing the data in a database.


Background

In one of our projects in Africa, we are working with a language that uses characters like # and | to represent click sounds. And they have some select_from_list_name fields where the range of values are provided in the form XML (represented as a string, and a system name).

Kobo can't work with those for system names, and so in the builder (which the partner used to construct their form), the system name for most of those characters are replaced by _. Something like //ru|do# becomes _ru_do_.

When we download the Kobo form responses and visualize the data in Superset, there is no way to reconstruct the proper string with the replaced characters. (In the past, I have written custom SQL queries for Superset that turn values like village-name_country into Village name, Country; here there is no way to do that.)

What we could do and why this gets complicated

In the {formId}/ API endpoint response, for questions with a select_from_list_name, there is a list_name ID provided for the list. Then, for each of the values associated with list_name, there is a label key that stores the original string.

If we were to store the form metadata, we can use that label on the front end per the matching ID.

It might get a little hairy getting this form metadata into a flattened db though (if that's the way we go about this); there are many nested objects and arrays in the JSON, and the keys are not all unique. Just this select_from_list_name example is structured in the following way in the API response...

    "content": {
        "schema": "1",
        "survey": [
            {
                "name": "_ru_do_",
                "$kuid": "[REDACTED]",
                "label": [
                    "//ru|do#"
                ],
                "list_name": "[REDACTED]",
                "$autovalue": "_ru_do_"
            },

...So if we were to wholesale write flattened column names for this, we'd end up with something non-unique and pretty meaningless like

name__survey__content

So it looks like we'd probably have to have some business logic for transforming columns per specific sections of the Kobo form metadata, which are going to be the same across different forms. For example, for anything in the content.survey object, we can use the value of name to constitute the field to be like...

list_name__{content.survey.name}

Obviously more schematizing needs to be done here, but just sharing my initial findings to inform further work on this issue.

@rudokemper rudokemper added the feature New specs for new behavior label Dec 11, 2024
@rudokemper rudokemper changed the title [kobotoolbox] Download Kobo form metadata [frizzle/kobotoolbox] Download Kobo form metadata Dec 11, 2024
@rudokemper rudokemper self-assigned this Dec 16, 2024
@rudokemper rudokemper changed the title [frizzle/kobotoolbox] Download Kobo form metadata [connectors/kobotoolbox] Download Kobo form metadata Dec 16, 2024
@rudokemper rudokemper added the connectors Connector scripts for ETL from upstream data sources label Jan 3, 2025
@rudokemper rudokemper changed the title [connectors/kobotoolbox] Download Kobo form metadata [kobotoolbox] Download Kobo form metadata Jan 3, 2025
@rudokemper rudokemper added this to the Nia Tero 2025 milestone Feb 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
connectors Connector scripts for ETL from upstream data sources feature New specs for new behavior
Projects
None yet
Development

No branches or pull requests

1 participant