-
Notifications
You must be signed in to change notification settings - Fork 0
Process journal papers and add content to MTE #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note: the MTE schema currently has an We will also need to decide how to generate a |
|
I categorized 18 papers that passed the initial filtering process by missions to ensure that we at least have one paper for each MERA, MERB, MPF, and PHX mission. Some papers may appear in more than one mission list. For example, the paper MERA:
MERB:
MPF:
PHX:
|
Thanks, Steven! It looks like there are 14 unique papers here. Are the other 4 that passed the filter worth including? |
@wkiri The other 4 papers are MSL papers. Sorry that I forgot to mention them. MSL:
|
@stevenlujpl Great, thanks for the clarification! |
@wkiri I tested the changes I made to the MTE codebase, and it seems it is working fine. Please see the commits above for details about the code changes. The changes are currently checked into the
I tested this approach using the 18 journal papers that passed the initial filtering process. the MPF jsonl, DB, and PDS4 bundle files can be found at the following locations in my /home dir. The bundle
This approach requires only a few minor changes (as shown in the commits above) in the codebase because it doesn't require changes in DB schema. The drawback is that the The current MTE website code won't work with the DB files generated from journal papers primarily because of the lack of the |
@stevenlujpl This is great progress! Thank you! Can you place the generated The changes seem fine in general. I have two questions:
It seems I overlooked that "abstract" is not included in the final .csv files that are delivered. I should correct this in the schema diagram and in the README. I agree that the sqlite DB is an intermediate product so it's ok for it to have more information even if not used later. As you note, however, the website does use the DB directly. It makes sense to prioritize getting the journal paper content into PDS4 bundles first, and if time remains, then update the website (but it's not on the critical path for the time remaining). |
It should be possible to retain the year field for documents indexed in the ADS database.
It seems from the ADS website search results, the DOI fields are already formatted as URLs. I will double-check the format of the DOIs returned by directly querying the ADS database. These are great suggestions. I will work on them now. I copied the .jsonl files to the following locations in
|
@wkiri I couldn't test the |
@wkiri Do you know how to use a DOI to form a URL? The DOIs returned by the ADS database aren't formed in URL. For example, the DOI returned for "Analysis of MOLA data for the Mars Exploration Rover landing sites" is |
I just googled, and it seems we can use this patter |
@wkiri I've added the
|
I created a brat site for the MPF JSONL file here: |
The PHX output is available here: |
@wkiri Do we need to add a few more journal papers for MPF? |
Yes, we should try to add the 4 JGR + 1 Science papers that are referenced in https://github.com/wkiri/MTE/tree/master/ref/MPF#readme |
The MER-A output is available here: |
The MER-B output is available here:
|
To make review easier, I have pruned the documents for each mission in the "journals" directory under brat to only include the documents to be reviewed. |
@wkiri I've processed the 6 MPF documents you added. 5 documents were successfully processed, and one document ( I've copied the jsonl file to the following location:
I also copied the jsonl files from the initial MERA, MERB, MPF, and PHX runs to |
@stevenlujpl Thank you, that was fast! I'll look at these tomorrow. |
@stevenlujpl These look great! They are now available at |
@wkiri Great! Thanks for sharing the brat URL. There are targets and relations, which look promising. Please let me know if you need help reviewing them (even after this week). |
@wkiri I have updated the MTE parser and bundle generation scripts based on what we discussed on Monday. Please see the following steps for generating a PDS4 bundle with both LPSC and journal papers:
I tested the scripts with 5 LPSC and 1 journal paper, and verified the results manually and with the PDS4 validate tool. I didn't find any problem. I am attaching the jsonl, DB, and bundle file in the following .zip file. Please take a look and let me know if you find any problems. Thanks. |
@stevenlujpl This sounds great!!! Thanks for pulling it all together. I haven't looked at the .zip file yet but will try to do so tomorrow. For the full process, I believe there will be 2 steps between 4 and 5 in which we run |
The per-mission LPSC .jsonl files are:
See |
@wkiri I've added the script to insert mte_parser fields to an existing jsonl file. I also processed the pre-mission LPSC .jsonl files to insert mte_parser fields. The updated pre-mission LPSC .jsonl files are at the following locations:
Please take a look and let me know if you find any problems. Thanks. |
The first step is to try parsing the journal documents @stevenlujpl already downloaded.
For some documents, we may need to process them multiple times for each mission whose targets are mentioned (see issue #22).
ingest_sqlite.py
update_sqlite.py
twice: once with LPSC annotations and once with journal paper annotationsThe text was updated successfully, but these errors were encountered: