-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example for American Community Survey data #364
Add example for American Community Survey data #364
Conversation
Running into a bug, filed an issue: #363 |
@kylebarron just added the full American Community Survey data; initial income filter example: ![]() Any ideas for other filters / lonboard features to try if this example may be worth including? Happy to make it more readable / user friendly as I think this data deserves more use. I really like your demo of housing shapes and was hoping to do something similar. ChatGPT has tons of ideas but hard to parse through. Added an example ChatGPT prompt that describes all the variables available here: https://github.com/jaanli/exploring_american_community_survey_data/blob/main/prompts/exploring-new-york-city.md (it suggests a scavenger hunt instead of an analysis / geospatial exploration :p) |
…it of public use microdata sample
…ry public use microdata area
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution!
I'm not sure where is the best place for this example notebook to live.
This is a great example because it shows the end-to-end process of accessing, reshaping and visualizing this specific dataset... but on the other hand it's not something that I know how to maintain. The majority of the code is in preparing the data, (and steps very specific to Census data at that). Ideally the example notebooks in this repo have the maximum ratio of lonboard-specific visualization to data preparation so that it's as approachable as possible for as many users as possible. (I can imagine someone unfamiliar with Census data easily getting overwhelmed).
Perhaps the best course of action is to have a page in the docs website linking to external examples. Then there's a good distinction that the external examples are good showcases but aren't "officially maintained". What would you think about, say, linking to a notebook in one of your repos?
.venv/ | ||
*.venv | ||
.venv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are redundant; line 2 is probably sufficient for all cases.
@@ -41,7 +41,7 @@ | |||
}, | |||
{ | |||
"cell_type": "code", | |||
"execution_count": 1, | |||
"execution_count": 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert the changes to this notebook? It looks like you ran the notebook again and saved it but didn't make changes.
I agree! I like that solution. The hardest part is indeed the transformation of the data and orchestration of dbt + duckdb, and it is overwhelming even with GPT-4 :( I'm down to try linking to it as an external notebook. The downside of linking externally is that it is hard to guarantee links will remain valid; perhaps a GitHub action / continuous integration test could help lint documentation for broken links? But I'm not sure how to get the example to display, in the GitHub's web view of Jupyter notebooks or with nbviewer.org (link). Let me know if there's a place to add an external link and I can revise the PR accordingly! 🙏 |
Absolutely for most visualizations, 80% of the work is just in preparing the data. I'm thinking of a page in the docs like the deck.gl "showcase page". Next week we have an onsite and I'll be really busy, but a good task for the following week is for me to create that page. There's virtually no online notebook hosting platform where the map will also render; the map needs a running Python session. None of our notebook examples in our docs site render the map either. In the future we want to test saving the notebooks as html with the map data embedded, but that doesn't always work. |
Cool! I think there might be a solution to explore longer term, especially as Mosaic adds spatial support and Observable Framework’s support for parquet: observablehq/framework#834 (comment) Enjoy the offsite and stoked to add this to the showcase down the line! |
Do you have a link for this? |
Yup! Here it is @kylebarron : https://uwdata.github.io/mosaic/examples/nyc-taxi-rides.html I also plan on trying Observable Framework's geospatial support with Protomaps to get something similar: https://bdon.github.io/observable-framework-maps/example-map ![]() Tested observable framework over the weekend for this ACS historical data; I think lonboard filter extension would be great for this too to see patterns: https://jaanli.github.io/american-community-survey/income |
Just a start for #364 cc @jaanli This is copied from geoarrow-rust for now. The idea is to have a grid of some sort, with a title, some text, and an image per example. <img width="1379" alt="image" src="https://github.com/developmentseed/lonboard/assets/15164633/e810d4ee-cc04-4d6d-a4aa-3d7b65f86f61">
In #401 I created a new top-level examples page with an image or gif per notebook. On that page I added the gif you posted here, as well as linked to your profile and notebook permalink in this PR. I think this is a better long term solution than adding those notebooks into this repo directly. Feel free to make a PR to edit that page if you want to make a change! |
"\n", | ||
"# If you had a direct way to map each exploded point to its PUMA, you'd fill puma_to_point_index here\n", | ||
"# For demonstration, let's assume each point in points_for_people is already associated with a PUMA:\n", | ||
"for i, puma in enumerate(puma_indices): # This assumes puma_indices is aligned with points_for_people\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way I tried to run this notebook and got
NameError: name 'puma_indices' is not defined
I think it would be interesting to add an example to let folks build on the American Community Survey data.
There are a lot of variables and use cases enabled by the new data filter extension!
Full list of variables:
https://github.com/jaanli/exploring_american_community_survey_data/blob/main/american_community_survey/models/public_use_microdata_sample/generated/enum_types_mapped_renamed/housing_units_united_states_first_tranche_enum_mapped_renamed.sql
Will start this PR in case others are able to help.
Example GIF of the filter extension: https://s13.gifyu.com/images/SCGH2.gif