-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create enriched relatedBills JSON #7
Comments
To start, create the JSON above with only the billCongressTypeNumber and 116s130: [
{ billCongressTypeNumber: '116hr201'
titles: ['Shared Title 1', 'Shared Title 2', etc.]
}...
],
] |
I'm a little confused about the billCongressTypeNumber. This top level key is also a billCongressTypeNumber right? Where is that value coming from? As for the titles, I imagine I can use the titles index for this? Or would it make sense to use the billsmeta.json? |
This is an extension of what you've done in In {
"116hjres58": {"same_titles": ["116hjres58"]},
...
"116hjres56": {"same_titles": ["116hjres56", "115hjres142"]},
...
} This will become: {
"116hjres58": [{
billCongressTypeNumber: "116hjres58",
titles: ["Title 1", "Title 2"] // in this case, it is all of the titles of this bill, since this is the 'identity' item.
}],
...
"116hjres56": [ {
billCongressTypeNumber: "116hjres56",
titles: ["Title of this bill", "Another title of this bill"] // again, this is the 'identity' item for 116hjres56
},
{
billCongressTypeNumber: "115hjres142",
titles: ["Shared Title 1", "Shared Title 2"] // this is the list of titles that are common between 116hjres56 and 115hjres142
],
...
} |
So far I'm up to this: def getSameTitles():
titlesIndex = loadTitlesIndex()
sameTitlesIndex = {}
for title, bills in titlesIndex.items():
for bill in bills:
if not sameTitlesIndex.get(bill):
sameTitlesIndex[bill] = []
for bill in bills:
billObj = {
'billCongressTypeNumber': bill,
'titles': [title]
}
objlist = sameTitlesIndex.get(bill, [])
objlist.append(billObj)
# else:
# current_same_titles = sameTitlesIndex[bill].get('same_titles')
# if current_same_titles:
# combined_bills = list(set(current_same_titles + bills))
# sameTitlesIndex[bill]['same_titles'] = combined_bills I think I am running into the same problem where I am creating a new object for every bill. Here's my logic so far:
-if the bill number is present, append its title to the titles list |
Start with the To do this, you should Then make your own that looks like this and save it to relatedBills.json:
You can do this with a
Start by saving that Dict (with empty arrays as values) into |
Ahhh, I see. Then loop over the titlesIndex, creating an object for each of the related billnums and then append it to the list for that billnum? |
Yes. You can do it with the loops you have above, but starting with a full list of bills can help you think about what is happening in each loop. You'll still have to handle multiple bill numbers in multiple titles. Think through what is happening at each stage. Work it out on paper with a small sample set of titlesIndex.json:
What happens:
|
Made an PR with the updated logic. I believe it's most of the way there, so appreciative of your help! |
I refactored the function to create relatedBills.json in one pass, with an outer and inner loop. My comments on the changes are here: #8 Closing. |
This is an extension of #6 and should be combined into PR #5
Generate a JSON that contains rich information about bill similarity. For each bill, there will be a list of objects for related bills. The JSON will have billnumbers as the keys, and the value will be an array of objects corresponding to related bills. In each object will be information about what the two bills share. So, for example:
The text was updated successfully, but these errors were encountered: