-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Locality and Geocoorddetail batch import tool #4548
Conversation
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Header/ImportLocalitySet.tsx
Outdated
Show resolved
Hide resolved
specifyweb/frontend/js_src/lib/components/Molecules/FilePicker.tsx
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was doing some light testing on this and got this error. These coordinates should be fine in Specify, since Locality is fine with it. Looks like the issue is just the >= in the parse_latlong
function in parse.py
when it should just be >!
Good work so far btw Jason! While I haven't done full testing yet, what I've gotten through so far is great. 👍
Hi @specify/ux-testing! Also, if you encountered the Issue from #4998 in this PR before, it should now additionally be fixed.
From #4548 (review) Both the WorkBench and the Locality Update Tool utilizes the same parsing/validating code. (So this Issue is really the same as #4914). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, the cases which would need to be tested are:
- Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
- Data in unknown columns is properly ignored and not accounted for when uploading
- If a Locality contains existing data for
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the dataset - If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
- Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if
longitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten) - Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
- Potential ways to introduce invalid values include:
- Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
- Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
- Not following a UIFormatter for a field
- Potential ways to introduce invalid values include:
The latitude/longitude issue has been fixed! However, It looks like cells that are empty are overwriting fields that contain data.
Link to record: https://fwri1924-coge-import.test.specifysystems.org/specify/view/locality/35357/?recordsetid=1920
Imported file: fwri_coge_sample - fwri_coge_sample (8).csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, the cases which would need to be tested are:
- Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
- Data in unknown columns is properly ignored and not accounted for when uploading
- If a Locality contains existing data for
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the dataset - If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
- Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if
longitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten) - Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
- Potential ways to introduce invalid values include:
- Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
- Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
- Not following a UIFormatter for a field
- Potential ways to introduce invalid values include:
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My previous comment was the expected behavior; everything looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, the cases which would need to be tested are:
- Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
- Data in unknown columns is properly ignored and not accounted for when uploading
- If a Locality contains existing data for
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the dataset - If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
- Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if
longitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten) - Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
- Potential ways to introduce invalid values include:
- Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
- Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
- Not following a UIFormatter for a field
- Potential ways to introduce invalid values include:
Looks good! I also tested previous issues, and it seems like they all got fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, the cases which would need to be tested are:
- Ensure an error message is displayed to the user when there is no Locality which has a guid which exists in the dataset
- Data in unknown columns is properly ignored and not accounted for when uploading
- If a Locality contains existing data for
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the dataset - If there are no values in any GeoCoordDetail field for a Locality in the dataset and the Locality contains a GeoCoordDetail in Specify, the GeoCoordDetail should not be overwritten
- Ensure that only the Locality fields which contain data in the dataset get updated for a Locality in the dataset (i.e., if
longitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten) - Ensure parsing error messages are intuitive enough to diagnose and resolve the problem with the dataset
- Potential ways to introduce invalid values include:
- Entering a value in a field which exceeds the maximum length for the field (the maximum length for a field can be found in the Schema Configuration)
- Entering an incorrect type of value for the field (such as entering a letter into a number/integer field)
- Not following a UIFormatter for a field
- Potential ways to introduce invalid values include:
Some of the error messages have typos/etc, such as below:
Additionally, I couldn't do the Locality import on a user account that is a Specify7 Admin but didn't have a role defined in the collection:
18_batchadmin.mp4
Still, the majority of functionality is there and looking good Jason. Nice work! 🎉
@specify/ux-testing Previously, the backend was checking if the user was a specify 6 admin, not specify 7 admin... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now! The issue with institution admins w/o an admin role in the collection not being able to upload is fixed now. 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that I didn't get to reviewing this yesterday.
looking great!
} | ||
|
||
return ( | ||
<ProtectedAction action="%" resource="%"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uh interesting
is this to restict this to institutional admins only?
if so, could you put a short comment here?
}; | ||
|
||
export type LocalityUpdateTaskStatus = | ||
| 'ABORTED' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency, we usually name states in PascalCase not UPPER_CASE
readonly rowNumber: number; | ||
}; | ||
|
||
export type LocalityUpdateTaskStatus = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to duplicate it. can infer it
export type LocalityUpdateTaskStatus = | |
export type LocalityUpdateTaskStatus = LocalityUpdateState['state']; |
'ABORTED', | ||
{ readonly taskstatus: 'ABORTED'; readonly taskinfo: string } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this duplicates the fields
the State<>
helper adds a {state:'ABORTED'}
, and then you redundantly add taskstatus
Ideally you would pick one or another
If you want to keep the taskstatus name, then get rid of State<>
} else if (key === 'multipleLocalitiesWithGuid') { | ||
return localityText.multipleLocalitiesWithGuid({ | ||
guid: payload.guid as string, | ||
localityIds: (payload.localityIds as RA<number>).join(', '), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call formatConjunction instead of join?
@@ -1,6 +1,6 @@ | |||
export type MergeStatus = 'ABORTED' | 'FAILED' | 'MERGING' | 'SUCCEEDED'; | |||
export type MergingStatus = 'ABORTED' | 'FAILED' | 'MERGING' | 'SUCCEEDED'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) use PascalCase for statuses?
queryResource === undefined || queryResource.isNew() | ||
? undefined | ||
: queryResource.get('name'); | ||
function LoadingDialog(): JSX.Element { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional) add optional header
parameter to <LoadingScreen />
since we needed that in several places
and then look for usages of loadingBar
in dialog that can be replaced with it (there would be one in the react workbench I think)
const resolvedRecords = | ||
typeof rawSortFunction === 'function' | ||
? Object.entries(recordCounts).sort(sortFunction(rawSortFunction)) | ||
: Object.entries(recordCounts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useMemo?
Fixes #4319
The Locality and GeoCoordDetail batch import tool allows users to match and update the Latitude1, Longitidude1, and Datum fields for Locality records and associate new GeoCoordDetails for the matched Locality records.
The primary motivation for the tool is such that the georefrencing for existing records in the database can take place outside of Specify (such as with CoGe), and there is no easy way to currently repatriate the data back into Specify and update those records.
This process is facilitated by requiring the guids of the Locality records in the CSV. Specify will use these guids to match/find the corresponding Locality record(s).
With the Locality record, the tool does the following:
latitude1
,longitude1
, and/ordatum
fields have values for the Locality record in the CSV, Specify will overwrite the current values in the database with those specified in the CSVDemo
Currently, access to the tool can be found in the User Tools menu:
The primary interface is very similar to that of importing a file to be used as a DataSet in the WorkBench. Specify supports different encodings and delimiters, and displays the first 100 rows of the file:
The
Import File
button can be pressed to initiate the parsing/upload.If there are any unknown columns in the dataset, a warning will first be displayed to the user so they can decide how to proceed:
If there are any bad values discovered when parsing the results, Specify will display a downloadable message stating which line the error was found on along with a message with more information about how to resolve the parsing error:
If the parsing and upload is successful, Specify will display a brief results page stating how many records of each table were affected, along with the option to create a RecordSet of modified Locality Records:
Checklist
and self-explanatory (or properly documented)
Testing instructions
For testing purposes, here is an example set of data for use on the
KUFish
database:kufish_coge_sample.csv
Other datasets have been made for
FITZ_NHMD
,fwri
,KUBirds
, andlsumzmammals
and are available in the following folder in the google drive:https://drive.google.com/drive/folders/1B0hKQMBaX82nBUOB95uEa3H0MIrGrBv7?usp=drive_link
A foundation for other Datasets for use in other databases can be made by creating a Locality -> GeoCoordDetail query and exporting the results. For an example, here is the Query used to populate most of
kufish_coge_sample.csv
:https://kufish51623-edge.test.specifysystems.org/specify/query/191/
There should be no errors in the
kufish_coge_sample
dataset presently, so modifications can be made to accomplish the desired errors and edge cases.Generally, the cases which would need to be tested are:
GeoCoordDetails
, it should be deleted if there is at least oneGeoCoordDetail
field in the datasetlongitude1
anddatum
have values in the dataset butlatitude1
is either not a column or empty, ensure thatlatitude1
is never overwritten)