-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On FT_branch_1, for SAP, normalize date format and/or deduplicate data #138
Comments
115hr2 seems to have duplicates as well. Three of these are dated May 15, 2018, and the two links go to the same document. I am curious what the first statement, dated June 26, 2018 refers to. It is possible that this is a consequence of the way the websites presented the data and maybe the Whitehouse had duplicate data. If that is the case, there is nothing we can do about the data. We should consider adding a note in the table that duplicates may be a result of poor data from Whitehouse website. |
Josh suggested to change the date format to YYYY-MM-DD. I changed it from MM DD, YYYY to YYYY-MM-DD. If we want the older one, I can change it back. |
Thanks for explaining. We'll leave it the way Josh suggested.
I suspected that might be the case. Thank you for checking. I deleted the table this morning, hoping to re-create it and test, but now I'm stuck in a loop of migration problems :-(. |
For the two items you show for HR2, there is still some duplication: the 'Moving Forward Act' is for 116HR2, not 115HR2. Do you know why this is picked up for 115hr2? |
Yes, data quality for the Trump administration (from the Whitehouse website) is poor. I re-scraped. They removed most of it. We have many bills with no SAP pdf files. I am working on a way around for this. |
Closing. We're now working on updated branches. |
It appears that the Statements of Administrative Policy here are duplicates, but data is formatted differently between them (we want the date format of the first one:
March 5, 2019
) and the pdf link only appears on the second one:The text was updated successfully, but these errors were encountered: