Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for American Community Survey data #364

Closed

Conversation

jaanli
Copy link

@jaanli jaanli commented Feb 14, 2024

I think it would be interesting to add an example to let folks build on the American Community Survey data.

There are a lot of variables and use cases enabled by the new data filter extension!

Full list of variables:

https://github.com/jaanli/exploring_american_community_survey_data/blob/main/american_community_survey/models/public_use_microdata_sample/generated/enum_types_mapped_renamed/housing_units_united_states_first_tranche_enum_mapped_renamed.sql

Will start this PR in case others are able to help.

Example GIF of the filter extension: https://s13.gifyu.com/images/SCGH2.gif

@jaanli
Copy link
Author

jaanli commented Feb 14, 2024

Running into a bug, filed an issue: #363

@jaanli
Copy link
Author

jaanli commented Feb 14, 2024

@kylebarron just added the full American Community Survey data; initial income filter example:

image

Any ideas for other filters / lonboard features to try if this example may be worth including? Happy to make it more readable / user friendly as I think this data deserves more use.

I really like your demo of housing shapes and was hoping to do something similar. ChatGPT has tons of ideas but hard to parse through.

Added an example ChatGPT prompt that describes all the variables available here: https://github.com/jaanli/exploring_american_community_survey_data/blob/main/prompts/exploring-new-york-city.md (it suggests a scavenger hunt instead of an analysis / geospatial exploration :p)

Copy link
Member

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

I'm not sure where is the best place for this example notebook to live.

This is a great example because it shows the end-to-end process of accessing, reshaping and visualizing this specific dataset... but on the other hand it's not something that I know how to maintain. The majority of the code is in preparing the data, (and steps very specific to Census data at that). Ideally the example notebooks in this repo have the maximum ratio of lonboard-specific visualization to data preparation so that it's as approachable as possible for as many users as possible. (I can imagine someone unfamiliar with Census data easily getting overwhelmed).

Perhaps the best course of action is to have a page in the docs website linking to external examples. Then there's a good distinction that the external examples are good showcases but aren't "officially maintained". What would you think about, say, linking to a notebook in one of your repos?

Comment on lines +1 to +3
.venv/
*.venv
.venv
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are redundant; line 2 is probably sufficient for all cases.

@@ -41,7 +41,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert the changes to this notebook? It looks like you ran the notebook again and saved it but didn't make changes.

@jaanli
Copy link
Author

jaanli commented Feb 15, 2024

I agree! I like that solution. The hardest part is indeed the transformation of the data and orchestration of dbt + duckdb, and it is overwhelming even with GPT-4 :(

I'm down to try linking to it as an external notebook. The downside of linking externally is that it is hard to guarantee links will remain valid; perhaps a GitHub action / continuous integration test could help lint documentation for broken links?

But I'm not sure how to get the example to display, in the GitHub's web view of Jupyter notebooks or with nbviewer.org (link).

Let me know if there's a place to add an external link and I can revise the PR accordingly! 🙏

@kylebarron
Copy link
Member

kylebarron commented Feb 16, 2024

Absolutely for most visualizations, 80% of the work is just in preparing the data.

I'm thinking of a page in the docs like the deck.gl "showcase page". Next week we have an onsite and I'll be really busy, but a good task for the following week is for me to create that page.

There's virtually no online notebook hosting platform where the map will also render; the map needs a running Python session. None of our notebook examples in our docs site render the map either. In the future we want to test saving the notebooks as html with the map data embedded, but that doesn't always work.

@jaanli
Copy link
Author

jaanli commented Feb 17, 2024

Cool! I think there might be a solution to explore longer term, especially as Mosaic adds spatial support and Observable Framework’s support for parquet: observablehq/framework#834 (comment)

Enjoy the offsite and stoked to add this to the showcase down the line!

@kylebarron
Copy link
Member

Mosaic adds spatial support

Do you have a link for this?

@jaanli
Copy link
Author

jaanli commented Feb 27, 2024

Yup! Here it is @kylebarron : https://uwdata.github.io/mosaic/examples/nyc-taxi-rides.html

I also plan on trying Observable Framework's geospatial support with Protomaps to get something similar: https://bdon.github.io/observable-framework-maps/example-map

image

Tested observable framework over the weekend for this ACS historical data; I think lonboard filter extension would be great for this too to see patterns: https://jaanli.github.io/american-community-survey/income

high_quality

@kylebarron kylebarron mentioned this pull request Mar 1, 2024
kylebarron added a commit that referenced this pull request Mar 21, 2024
Just a start for #364 cc
@jaanli

This is copied from geoarrow-rust for now. The idea is to have a grid of
some sort, with a title, some text, and an image per example.

<img width="1379" alt="image"
src="https://github.com/developmentseed/lonboard/assets/15164633/e810d4ee-cc04-4d6d-a4aa-3d7b65f86f61">
@kylebarron
Copy link
Member

kylebarron commented Mar 21, 2024

In #401 I created a new top-level examples page with an image or gif per notebook. On that page I added the gif you posted here, as well as linked to your profile and notebook permalink in this PR. I think this is a better long term solution than adding those notebooks into this repo directly.

Feel free to make a PR to edit that page if you want to make a change!

@kylebarron kylebarron closed this Mar 21, 2024
"\n",
"# If you had a direct way to map each exploded point to its PUMA, you'd fill puma_to_point_index here\n",
"# For demonstration, let's assume each point in points_for_people is already associated with a PUMA:\n",
"for i, puma in enumerate(puma_indices): # This assumes puma_indices is aligned with points_for_people\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way I tried to run this notebook and got

NameError: name 'puma_indices' is not defined

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants