Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locality and Geocoorddetail batch import tool #4548

Merged
merged 108 commits into from
Jun 18, 2024
Merged

Conversation

melton-jason
Copy link
Contributor

@melton-jason melton-jason commented Feb 15, 2024

Fixes #4319

The Locality and GeoCoordDetail batch import tool allows users to match and update the Latitude1, Longitidude1, and Datum fields for Locality records and associate new GeoCoordDetails for the matched Locality records.

The primary motivation for the tool is such that the georefrencing for existing records in the database can take place outside of Specify (such as with CoGe), and there is no easy way to currently repatriate the data back into Specify and update those records.

This process is facilitated by requiring the guids of the Locality records in the CSV. Specify will use these guids to match/find the corresponding Locality record(s).
With the Locality record, the tool does the following:

  • If any latitude1, longitude1, and/or datum fields have values for the Locality record in the CSV, Specify will overwrite the current values in the database with those specified in the CSV
  • If any other column in the CSV matches a non-relationship field name in the GeoCoordDetail table, Specify will delete the previous GeoCoordDetail associated with the Locality record and create a new one populated with field values from the CSV
  • If there are any other columns in the CSV, Specify will display a warning to the user indicating there are unknown columns, but allow the user to proceed; the unknown columns will be ignored if the user proceeds.

Demo

Currently, access to the tool can be found in the User Tools menu:

locality_import_button

The primary interface is very similar to that of importing a file to be used as a DataSet in the WorkBench. Specify supports different encodings and delimiters, and displays the first 100 rows of the file:

coge_preview

The Import File button can be pressed to initiate the parsing/upload.

If there are any unknown columns in the dataset, a warning will first be displayed to the user so they can decide how to proceed:

coge_column_warning

If there are any bad values discovered when parsing the results, Specify will display a downloadable message stating which line the error was found on along with a message with more information about how to resolve the parsing error:

parse_error

If the parsing and upload is successful, Specify will display a brief results page stating how many records of each table were affected, along with the option to create a RecordSet of modified Locality Records:

upload_results

Checklist

  • Self-review the PR after opening it to make sure the changes look good
    and self-explanatory (or properly documented)
  • Add automated tests
  • Add relevant issue to release milestone

Testing instructions

For testing purposes, here is an example set of data for use on the KUFish database:
kufish_coge_sample.csv

Other datasets have been made for FITZ_NHMD, fwri, KUBirds, and lsumzmammals and are available in the following folder in the google drive:
https://drive.google.com/drive/folders/1B0hKQMBaX82nBUOB95uEa3H0MIrGrBv7?usp=drive_link

A foundation for other Datasets for use in other databases can be made by creating a Locality -> GeoCoordDetail query and exporting the results. For an example, here is the Query used to populate most of kufish_coge_sample.csv:
https://kufish51623-edge.test.specifysystems.org/specify/query/191/

There should be no errors in the kufish_coge_sample dataset presently, so modifications can be made to accomplish the desired errors and edge cases.

Generally, the cases which would need to be tested are:

  • Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
  • Data in unknown columns is properly ignored and not accounted for when uploading
  • If a Locality contains existing data for GeoCoordDetails, it should be deleted if there is at least one GeoCoordDetail field in the dataset
  • If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
  • Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if longitude1 and datum have values in the dataset but latitude1 is either not a column or empty, ensure that latitude1 is never overwritten)
  • Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
    • Potential ways to introduce invalid values include:
      • Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
      • Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
      • Not following a UIFormatter for a field

@melton-jason melton-jason added this to the 7.9.4 milestone Feb 15, 2024
@melton-jason melton-jason removed the request for review from maxpatiiuk February 21, 2024 21:13
Copy link
Collaborator

@combs-a combs-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Was doing some light testing on this and got this error. These coordinates should be fine in Specify, since Locality is fine with it. Looks like the issue is just the >= in the parse_latlong function in parse.py when it should just be >!

Good work so far btw Jason! While I haven't done full testing yet, what I've gotten through so far is great. 👍

@melton-jason melton-jason requested review from a team June 18, 2024 16:43
@melton-jason
Copy link
Contributor Author

Hi @specify/ux-testing!
I have merged production into this PR. Notably, this merge included changes from #4764, which included a lot of changes.

Also, if you encountered the Issue from #4998 in this PR before, it should now additionally be fixed.


Was doing some light testing on this and got this error. These coordinates should be fine in Specify, since Locality is fine with it. Looks like the issue is just the >= in the parse_latlong function in parse.py when it should just be >!

From #4548 (review)

Both the WorkBench and the Locality Update Tool utilizes the same parsing/validating code. (So this Issue is really the same as #4914).
This has been fixed as well!

Copy link
Collaborator

@lexiclevenger lexiclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the cases which would need to be tested are:

  • Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
  • Data in unknown columns is properly ignored and not accounted for when uploading
  • If a Locality contains existing data for GeoCoordDetails, it should be deleted if there is at least one GeoCoordDetail field in the dataset
  • If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
  • Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if longitude1 and datum have values in the dataset but latitude1 is either not a column or empty, ensure that latitude1 is never overwritten)
  • Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
    • Potential ways to introduce invalid values include:
      • Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
      • Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
      • Not following a UIFormatter for a field

The latitude/longitude issue has been fixed! However, It looks like cells that are empty are overwriting fields that contain data.

Before Import:
Screenshot 2024-06-18 at 12 32 58 PM

Imported (blank) value:
Screenshot 2024-06-18 at 12 33 19 PM

After Import
Screenshot 2024-06-18 at 12 33 36 PM

Link to record: https://fwri1924-coge-import.test.specifysystems.org/specify/view/locality/35357/?recordsetid=1920
Imported file: fwri_coge_sample - fwri_coge_sample (8).csv

Copy link
Collaborator

@emenslin emenslin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the cases which would need to be tested are:

  • Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
  • Data in unknown columns is properly ignored and not accounted for when uploading
  • If a Locality contains existing data for GeoCoordDetails, it should be deleted if there is at least one GeoCoordDetail field in the dataset
  • If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
  • Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if longitude1 and datum have values in the dataset but latitude1 is either not a column or empty, ensure that latitude1 is never overwritten)
  • Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
    • Potential ways to introduce invalid values include:
      • Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
      • Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
      • Not following a UIFormatter for a field

Looks great!

@lexiclevenger lexiclevenger self-requested a review June 18, 2024 19:18
Copy link
Collaborator

@lexiclevenger lexiclevenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment was the expected behavior; everything looks good!

@Areyes42 Areyes42 self-requested a review June 18, 2024 19:58
Copy link
Contributor

@Areyes42 Areyes42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the cases which would need to be tested are:

  • Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
  • Data in unknown columns is properly ignored and not accounted for when uploading
  • If a Locality contains existing data for GeoCoordDetails, it should be deleted if there is at least one GeoCoordDetail field in the dataset
  • If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
  • Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if longitude1 and datum have values in the dataset but latitude1 is either not a column or empty, ensure that latitude1 is never overwritten)
  • Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
    • Potential ways to introduce invalid values include:
      • Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
      • Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
      • Not following a UIFormatter for a field

Looks good! I also tested previous issues, and it seems like they all got fixed!

Copy link
Collaborator

@combs-a combs-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, the cases which would need to be tested are:

  • Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
  • Data in unknown columns is properly ignored and not accounted for when uploading
  • If a Locality contains existing data for GeoCoordDetails, it should be deleted if there is at least one GeoCoordDetail field in the dataset
  • If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
  • Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if longitude1 and datum have values in the dataset but latitude1 is either not a column or empty, ensure that latitude1 is never overwritten)
  • Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
    • Potential ways to introduce invalid values include:
      • Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
      • Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
      • Not following a UIFormatter for a field

Some of the error messages have typos/etc, such as below:
6_18_batchlocal1

Additionally, I couldn't do the Locality import on a user account that is a Specify7 Admin but didn't have a role defined in the collection:

18_batchadmin.mp4

Still, the majority of functionality is there and looking good Jason. Nice work! 🎉

@melton-jason melton-jason requested a review from combs-a June 18, 2024 20:26
@melton-jason
Copy link
Contributor Author

melton-jason commented Jun 18, 2024

@specify/ux-testing
The typos and "Additionally, I couldn't do the Locality import on a user account that is a Specify7 Admin but didn't have a role defined in the collection" have been fixed!

Previously, the backend was checking if the user was a specify 6 admin, not specify 7 admin...

Copy link
Collaborator

@combs-a combs-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now! The issue with institution admins w/o an admin role in the collection not being able to upload is fixed now. 🎉

@melton-jason melton-jason merged commit 0aed686 into production Jun 18, 2024
9 checks passed
@melton-jason melton-jason deleted the coge-import branch June 18, 2024 21:18
Copy link
Member

@maxpatiiuk maxpatiiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry that I didn't get to reviewing this yesterday.
looking great!

}

return (
<ProtectedAction action="%" resource="%">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh interesting
is this to restict this to institutional admins only?
if so, could you put a short comment here?

};

export type LocalityUpdateTaskStatus =
| 'ABORTED'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, we usually name states in PascalCase not UPPER_CASE

readonly rowNumber: number;
};

export type LocalityUpdateTaskStatus =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to duplicate it. can infer it

Suggested change
export type LocalityUpdateTaskStatus =
export type LocalityUpdateTaskStatus = LocalityUpdateState['state'];

Comment on lines +33 to +34
'ABORTED',
{ readonly taskstatus: 'ABORTED'; readonly taskinfo: string }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this duplicates the fields
the State<> helper adds a {state:'ABORTED'}, and then you redundantly add taskstatus
Ideally you would pick one or another
If you want to keep the taskstatus name, then get rid of State<>

} else if (key === 'multipleLocalitiesWithGuid') {
return localityText.multipleLocalitiesWithGuid({
guid: payload.guid as string,
localityIds: (payload.localityIds as RA<number>).join(', '),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call formatConjunction instead of join?

@@ -1,6 +1,6 @@
export type MergeStatus = 'ABORTED' | 'FAILED' | 'MERGING' | 'SUCCEEDED';
export type MergingStatus = 'ABORTED' | 'FAILED' | 'MERGING' | 'SUCCEEDED';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) use PascalCase for statuses?

queryResource === undefined || queryResource.isNew()
? undefined
: queryResource.get('name');
function LoadingDialog(): JSX.Element {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) add optional header parameter to <LoadingScreen /> since we needed that in several places
and then look for usages of loadingBar in dialog that can be replaced with it (there would be one in the react workbench I think)

Comment on lines +96 to +99
const resolvedRecords =
typeof rawSortFunction === 'function'
? Object.entries(recordCounts).sort(sortFunction(rawSortFunction))
: Object.entries(recordCounts);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useMemo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Create a tool to repatriate data from CoGe and other georeferencing utilities
9 participants