-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial Discussion #1
Comments
Might be interesting to talk to the OpenCorporates people about this. On Fri, May 24, 2013 at 2:26 PM, Adam Becker [email protected]:
Clay Johnson |
Also worth noting is the NIGP Code -- which is a parallel to NAICS. In On Fri, May 24, 2013 at 2:28 PM, Clay Johnson [email protected] wrote:
Clay Johnson |
Ah, that's the one that I couldn't remember. Maybe we should expand this project to not be standard-specific, and instead house as many different kinds of codes as possible. (With instructions for adding others.) |
Interested, and given that once again we have sold government IP to be privatized by a firm (NIGP) this is a fight too- which is all the more fun. Is this going to be like the DC Code first, to get access freely to the public NIGP data? |
Commodity (three digit) NIGP codes can be found on this word document, On Fri, May 24, 2013 at 6:44 PM, Spike [email protected] wrote:
Clay Johnson |
So, this is kind of ridiculous because @adamjacobbecker and I were randomly chatting while I was writing a scraper for... a NAICS API. @louh started this project, and I'm working on scraping content: https://github.com/louh/naics-api Current deployed API example: http://naics-api.herokuapp.com/v0/q?year=2012&code=519120 And I'm working on getting the text content scraped: see this issue -- codeforamerica/naics-api#5 and my pull request codeforamerica/naics-api#6 |
I am a fan of hosted APIs. You can embed the service into an app and as issues are clarified, everyone's app is updated. But, as @adamjacobbecker said, performance is key. If you go that route, you have to make sure it can handle being used, possibly to autocomplete search terms. I would need to give this a bit more thought, but what if the API is just JSON file hosted on Amazon and then, as @daguar did, create simple wrappers/scrappers in a few languages that can easily sync? Instead of being dependent on a fully online service, there is also a local cache. The local library can try to sync daily/weekly/monthly and the user of the library will get nice methods, like: NAICS::get_json(:code_id) or more specific: and even nice things like: And the option to sync: ... Ultimately, this is how most developers would use the service and not have to write their own custom wrappers with every single implementation. |
The thing about NAICS is that they change only every five years. Once data On Sat, May 25, 2013 at 11:34 AM, Eddie A Tejeda
Lou Huang |
I really like the idea of the data just living in a JSON file somewhere, as long as it remains small enough. Somehow I thought the database would be a lot bigger, but lou's current db is < .5mb. @eddietejeda are there any precedents for libraries like you describe? How would we cache everything locally? One issue I can see having is searchability -- how fast will it be to do a full-text search of all codes? Will it still be fast enough when/if we add NIGP codes? Prove me wrong, but I feel like we can only get away with our database being a .json file for so long. |
I'm a big fan of a dump+API approach. If you want to do autocomplete, you REALLY don't want to be doing that with a remote service you don't control. I think some API endpoint for a hash check about whether to update your local collection is the best approach. On May 26, 2013, at 10:51 AM, Adam Becker [email protected] wrote:
|
@adamjacobbecker I'm also missing descriptions and keywords right now, which @daguar is writing a scraper for. The full text in their PDF (I just did a straight copy-paste of its contents to a text file) is about 1.5mb. We're likely to move toward a database and have it return results in JSON. For now, codes and titles are just JSON so I can test out the API. +1 to @daguar's suggestion above, although having a remote service available will be great for anyone writing a small script and doesn't want an extra step for a local deployment. But ultimately, I'm doing the NAICS work to make it easier to write a better business registration tool for Las Vegas (e.g. OpenCounter), so if we wanted to do autocomplete (which we might) then we'd also have our local API for that. |
I think this work to make NAICS more accessible is great, but I want to raise what I think is a substantive point against building on top of NAICS. In practice, NAICS is a self-reported industrial classification of the contractor, and not necessarily a classification of what is being purchased. The federal contracting data has Product and Service codes in addition to NAICS that detail the category of the item purchased. A big contractor like General Dynamics may self-report it's industrial classification as whatever is on it's tax form or D&B registration, but they could be selling the government something that is classified under a totally different industrial category. It's one thing if the granularity of NAICS is lacking but if it's not describing the actual thing being bought, then I think it is not so useful. |
@kaitlin This limitation of NAICS is probably why they've extended it to a system called NAPCS (North American Product Classification System) to address what you're talking about here. I'm personally not at all familiar with NAPCS, but a cursory glance through some pages seem to indicate that (1) this is a draft still and (2) it's not in widespread use. Perhaps the goal is to eventually replace NIGP? I'm just spitballing here, but eventually if this system gets adopted then it would be great to have a NAICS API already established to build from. |
paging @GovInTrenches 📟 🔔 |
@cjoh has a screenshot of my favorite search on FedBizOpps: we queried the system for "web design" and the first result was for a trebuchet. no lie. |
Sorry, National Day of Civic Hacking is taking up all my bandwidth SmartChicago should be able to provide hosting for the app. Once NDoCH ends I should also be able to put more research time in. I agree about the API. Also would be more than happy to do necessary grunt work to get NGIP stuff. |
No worries, thanks @GovInTrenches. Just wanted to make sure there was nothing big we were missing since it seems like you're familiar with the issue. |
National Day of Civic Hacking story: A group here is playing around with business license data that Las Vegas released for the hackathon. They store some very generic business type names, but have more fine grained data with NAICS codes. So I pointed them to the naics-api repo. Then, to make it work a bit better for them, I built search functionality:
It still sucks, as far as what it needs to do eventually. It hasn't surpassed Census search functionality yet. But it's a start, and I'd just like to humblebrag because I did this and I don't even consider myself a real programmer. And if there's anything anyone wants to do to make it better, please do. |
Whooo here is a demo: |
Background
Desired outcomes
I currently have 3 in mind:
And down the road...
Next Steps
Would love to have some more discussion around this before anything else. Let's keep discussion in this thread for now.
The text was updated successfully, but these errors were encountered: