Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Edit: Support for editing basic fields #5417

Open
wants to merge 91 commits into
base: production
Choose a base branch
from
Open

Conversation

sharadsw
Copy link
Contributor

@sharadsw sharadsw commented Nov 26, 2024

Fixes #5413
Fixes #6209
Uses code from #4929

⚠️ Note: This PR affects database migrations. See migration testing instructions.

This PR pretty much has the same code as #4929 but updated to production. To minimize the scope for testing, batch edit for relationships is disabled in this PR and will be enabled in the follow up PR - batch edit for relationships. There are still some merge conflicts resulting from some newer code in prod which will be resolved in the next PR.

Checklist

  • Self-review the PR after opening it to make sure the changes look good
    and self-explanatory (or properly documented)
  • Add automated tests
  • Add relevant issue to release milestone

Testing instructions

  • Create a query
  • Click on Batch Edit. This will create a new Batch Edit dataset with columns involving relationships disabled (readonly)
  • Change values you wish to edit
  • Click Validate
  • Verify changed values are highlighted as updated cells
  • Click Commit. This will apply your changes
  • Re-run the original query and verify the values changed as expected
  • Click rollback
  • Verify values were rolled back to original

Regression tests

@sharadsw sharadsw changed the title Batch Edit: Support for editing basic fields [WIP] Batch Edit: Support for editing basic fields Nov 26, 2024
@sharadsw
Copy link
Contributor Author

sharadsw commented Dec 2, 2024

POTENTIAL IMPROVEMENTS?

  • Data Mapper in Batch Edit?
    • Should the Data Mapper be accessible in Batch Edit datasets? Since Batch Edit datasets can only be created from a Query, it seems confusing that users can still access the Data Mapper from a Batch Edit dataset with most of the mapper functionalities being disabled. It seems the mapper is only needed for changing Batch Edit Preferences. Could move the preferences dialog to Editor page instead.

@realVinayak
Copy link
Contributor

realVinayak commented Dec 2, 2024

Nice observation. My fault for keeping a hidden feature secret: Another, use for data mapper is to control “ignore when blank”-type behavior per field basis. That’s actually supported in that PR ( even for batch edit data mapper, you can modify this per-field setting. Ask grant for more context )

the only things not read only are

  1. Batch edit prefs
  2. Field behavior (never ignore, …)
  3. must match (also, could you leave a note somewhere to make trees must match by default?)

search for usages of “readonlyspec”.

For purposes of this PR, you can theoretically not add data mapper (you also don’t need batch edit prefs, they don’t do anything when there are no relationships)

Copy link
Contributor

@realVinayak realVinayak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice so far!

Some more user issues that I remember:

  1. Currently, there is no way to select specific rows and batch edit just them. This might be a requirement, maybe not. There is a work around to do so. The current implementation supports batch edit out of a record set. So, the user can:
  • Make a query.
  • Select the records, make record set out of it.
  • Query out of the record set, and batch edit on those results.

Quite a lot of steps. Check if that's acceptable for now, I've no clue if there have been any new expectations regarding BE. I think a decent implementation for this will be to follow the implementation in #5300 (I actually left a similar comment on there).

  1. Data sorting. Currently, if you have one table (so no relationships), sorting should always (of multiple columns) be respected (UX guys: test this). If you have relationships, and you also sort on some of the relationship's columns, it's essentially arbitrary (no guarantees regarding what user sees). From the code perspective, it is not arbitrary (see if you can figure those places out). I know that we don't support relationships, but this point actually applies regardless, since the same procedure is applied, just that we ignore relationships in upload plan. This might UX issue, a way to work around will be to sort in the workbench instead (in the spreadsheet).

  2. Potential performance issues, as mentioned in a below comment in stored_queries/views.py. IIRC, I once tried batch editing andy's entire voucher collection (45k CO records with determination and cataloger) and it took 1.2 minutes to construct the dataset (I think that's too slow - I deferred profiling it for later, maybe need to that).

EDIT: All the points are really optional, but would definitely recommend testing the performance and seeing if usability needs to be improved.

specifyweb/specify/api.py Show resolved Hide resolved
specifyweb/specify/datamodel.py Show resolved Hide resolved
specifyweb/stored_queries/format.py Outdated Show resolved Hide resolved
specifyweb/stored_queries/views.py Show resolved Hide resolved
specifyweb/workbench/upload/tomany.py Outdated Show resolved Hide resolved
specifyweb/workbench/upload/predicates.py Outdated Show resolved Hide resolved
specifyweb/stored_queries/batch_edit.py Outdated Show resolved Hide resolved
specifyweb/stored_queries/batch_edit.py Show resolved Hide resolved
specifyweb/workbench/upload/upload_table.py Show resolved Hide resolved
@sharadsw
Copy link
Contributor Author

@realVinayak Thank you! I have made notes of your points in different issues

Copy link
Collaborator

@emenslin emenslin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not have time to do a whole lot of testing but I did notice that on a locality data set the lat and long values show as updated even when you haven't edited them yet. I ran into this in the first batch edit pr but I don't know which comment I mentioned it in. This also happened with agent as the base table with some of the blank agent title records but I imagine it is the same issue

02-05_14.13.mp4

Cannot open the interactions page

02-05_14.24.mp4

Also the deafult form definitions are not showing up when you try to add a new one

02-05_13.59.mp4

Production:

02-05_14.28.mp4

@sharadsw
Copy link
Contributor Author

sharadsw commented Feb 7, 2025

NOTES:

  • The lat/long error is related to floating point precision and using wrapper classes to primitives. When latitude and longitude are compared to their previous values the comparison that happens is: Decimal('33.1493800000') == 33.14938 which cannot be true and is detected as a change. Will create a different issue for this since no data is lost or actually changed after upload. After upload 33.14938 gets set as 33.1493800000 anyway.
  • The agent title error happens when it has data that doesn't belong to its picklist (the error must exist for other picklists as well). After 5b1474b if a cell has data that doesn't belong to its picklist, the cell is empty. This causes a change to be detected instead of catching a picklist validation error

@sharadsw
Copy link
Contributor Author

sharadsw commented Feb 7, 2025

@emenslin Cannot reproduce the interactions dialog and app resource error locally on the same db and also other dbs. Can you try making a new copy of the original db for testing?

The agent title error should now be fixed and there should be a validation error for records that don't belong to its picklist. However, there might be cases where a agent title record has the same title as what the picklist has but a different value. In such cases there won't be a validation error but the record will be updated to the correct picklist entry despite making no changes.

Copy link
Collaborator

@combs-a combs-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Verify changed values are highlighted as updated cells
  • Verify values were rolled back to original

Didn't get to the regression test since I was testing other fields.

I wasn't able to recreate this issue a second time for some reason, but I had a case in which batch edit edited a preparation count above the amount it had available.

i.e.: Used a prep record which had a count of 3 with only 2 available preps. A gift preparation record was created, and the quantity of this gift preparation was 2. Opening this in batch edit, I was able to set the quantity on the gift preparation to 5 and save with no warning.

Checking the previous records, the quantity on the gift prep was 5, and then the original preparation record now had a count of 5. Rolling back the batch edit did not fix the preparation record.

Here's the audit log for that, though the old and new values are not displayed I know for sure that it was 3 before that batch edit, and there shouldn't have been two edits; just one from adding gift preps.

image

Link to the prep record: https://sdnhmherps2824batch2-issue-5413.test.specifysystems.org/specify/view/preparation/80361/

@emenslin
Copy link
Collaborator

emenslin commented Feb 7, 2025

Relating to the latlong issue, I know you said you already wrote up an issue for it but is there anyway it could be fixed in this pr or at least in the same release as batch edit? I know no values were actually changed but I feel like this could be very confusing and concerning to users. Although I used a simple example in the video this still happens if you were to make one change to the data set, the users would have no way of really knowing if they made a change or if it just looked like they did.

@sharadsw
Copy link
Contributor Author

sharadsw commented Feb 7, 2025

@emenslin Yeah we can do it in the same release/PR. My guess is that the problem exists in other places as well (even for non-decimal values) and we need a more nuanced way to compare old and new values. Will add it to this PR if it's a small fix

@realVinayak
Copy link
Contributor

we need a more nuanced way to compare old and new values.

Correct, but we already do (for API)! See

def fld_change_info(obj, field, val) -> Optional[FieldChangeInfo]:
if field.name != 'timestampmodified':
value = prepare_value(field, val)
if isinstance(field, FloatField) or isinstance(field, DecimalField):
value = None if value is None else float(value)
old_value = getattr(obj, field.name)
if str(old_value) != str(value): # ugh
return FieldChangeInfo(field_name=field.name, old_value=old_value, new_value=value)
return None

Should just be a mechanical change to use it here. Also,

if isinstance(field, DateTimeField) and isinstance(val, str):
branch is irrelevant to us (trace out why)

EDIT: I just saw you pushed 4814f11. Would still recommend change from api.py (to make things consistent)
/////////////////

There is actually a separate bug here in general. Previously, Elizabeth mentioned this error in 4929 review: #4929 (review). That is, some localities were flagged as new. This happens because, when parsing longitude and latitude fields, we also artificially set long1text and fields. That is, you'd observe that if we do a workbench upload but just set latitude1 and longitude1 fields, long1text and latitude1text will also be set.

Relevant code is here:

def parse_latlong(field: Field, value: str) -> ParseResult:
parsed = parse_coord(value)
if parsed is None:
return ParseFailure('coordinateBadFormat', {'value': value})
coord, unit = parsed
if field.name.startswith('lat') and abs(coord) > 90:
return ParseFailure("latitudeOutOfRange", {'value': value})
if field.name.startswith('long') and abs(coord) > 180:
return ParseFailure('longitudeOutOfRange', {'value': value})
return ParseSucess({field.name.lower(): coord,
'originallatlongunit': unit,
field.name.lower().replace('itude', '') + 'text': parse_string(value)})

Generically, parsing 1 field can result in multiple new fields. This is why ParseResult (which contains result for just one field) has the upload attribute as a dict. This is also why we have a nested loop when getting attributes that belong to a uploadTable

Relevant code:

attrs = {
fieldname_: value
for parsedField in self.parsedFields
for fieldname_, value in parsedField.upload.items()
}

Using the above, you might think "Oh, if we add extra lat1text and long1text, then doesn't this mean we would fail to match existing localities which have those fields set as null". That is, say {"latitude1": 34.07} is parsed as {"latitude1": 34.07, "lat1text": "34.07"}, then during match, we would fail to match this to a locality with {"latitude1": 34.07, "lat1text": None}. And, you'd be correct -- that's a bug. However, things weren't always this way. The above faulty behavior was introduced in #4548. Before that, the extra lat1text (and long1text) were just added to the filter part. Relevant ancient code:

def parse_latlong(field, value: str, column: str) -> Union[ParseResult, ParseFailure]:
parsed = parse_coord(value)
if parsed is None:
return ParseFailure('coordinateBadFormat', {'value':value}, column)
coord, unit = parsed
if field.name.startswith('lat') and abs(coord) >= 90:
return ParseFailure('latitudeOutOfRange', {'value':value}, column)
if field.name.startswith('long') and abs(coord) >= 180:
return ParseFailure('longitudeOutOfRange', {'value': value}, column)
text_filter = {field.name.replace('itude', '') + 'text': parse_string(value)}
return ParseResult(
text_filter,
{field.name: coord, 'originallatlongunit': unit, **text_filter},
None,
column,
None
)

I'd encourage you, trace that code out from the v7.9.0 (and see how it is just added to the filter part) --- this is also the reason why we have separate filer and upload attribute on ParseResult.

All the above is precursor to the bug (in batch-edit):

When you have latitude1 and/or longitude1, the relevant text will also be edited -- but, the user would not know that it did (since we report that). I'm not sure about the best way to fix this. Can this be discussed internally and determined if this needs to be fixed at all? I'm neutral on it:

  1. It changes data without user's permission (bad).
  2. It fixes possible incorrect data (good).

////////////////////////

Now, for the bug in #4929 (review).

You can probably see what the source of Elizabeth's bug was. IIRC, it was definitely because lat1text and long1text was empty (but we were matching with artificially added values -- which were not null). Can this be tested again when batch-edit supports relationships?

So, in conclusion, three bugs are related to locality.

  1. Decimal and value equality check. Can be fixed with that code from api.py.
  2. Extra fields are edited, without the user knowing.
  3. Extra fields are added when matching (partially caused from Locality and Geocoorddetail batch import tool #4548), which causes some localities to not be matched.

@sharadsw
Copy link
Contributor Author

@combs-a I'm not able to recreate it. I run into a validation error if I try to set the preparation quantity greater than the count or the availability:
Screenshot 2025-02-13 at 2 43 19 PM

What I did was:

  • Create a CO with preparation count 5
  • Create 2 Gift records
    • One with preparation qty 3
    • One with preparation qty 2
  • Tried to update preparation qty > 5 in Batch edit

@sharadsw
Copy link
Contributor Author

@realVinayak Doesn't look like fld_change_info can be used here without refactoring it to do something similar as is_equal from 4814f11. I'm happy to keep is_equal for now and note this in #6170

def fld_change_info(obj, field, val) -> Optional[FieldChangeInfo]:
if field.name != 'timestampmodified':
value = prepare_value(field, val)
if isinstance(field, FloatField) or isinstance(field, DecimalField):
value = None if value is None else float(value)
old_value = getattr(obj, field.name)
if str(old_value) != str(value): # ugh
return FieldChangeInfo(field_name=field.name, old_value=old_value, new_value=value)
return None

  • The string comparison is unreliable when comparing objects with a primitive. Not sure how we can handle it while keeping it generic
    • We get a scope change error when comparing discipline/scope values since the comparison becomes Discipline object (3) == 3 => false
  • For Decimal values, the old value remains a Decimal and the new value is a float and so it still gets detected as a change for lat/long
    • i.e str(Decimal(33.1493800000)) == str(33.14938) => 33.1493800000 == 33.14938 => false

Made a note of the lat/long text issues in #6126, thanks!

@sharadsw
Copy link
Contributor Author

@specify/testing The lat/long decimal issue should now be fixed. The lat/long text update + matching issues is added to #6126 and will be tested when we enable relationships.

@sharadsw sharadsw requested a review from combs-a February 13, 2025 21:29
@realVinayak
Copy link
Contributor

realVinayak commented Feb 13, 2025

Ah yes, you're right -- it's more involved than I thought, sorry for wasting your time. I assumed that the previous code is bug-free, but it is not (I tested this on sp7demofish, and latitude are always considered to be changed when saving a locality form -- that's another, unrelated to batch edit, bug). I agree, just keep what you have in that case.

Probably also need to document the lat/long text update (plus other fields that get updated, refer to the code for it). Maybe the discourse documentation is enough, or if there should be a warning when you use locality as the base table.

Just for reference (no changes needed):

RE: We get a scope change error when comparing discipline/scope values since the comparison become

Can replace field.name by field.attname and that'll fix this. Attname will convert discipline to discpline_id (which will be 3),

Copy link
Collaborator

@combs-a combs-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sharadsw I've tried to recreate my issue again a few times, and wasn't able to as well--it should be kept in mind, but it might've just been something very specific.

  • Verify changed values are highlighted as updated cells
  • Re-run the original query and verify the values changed as expected
  • Verify values were rolled back to original

Regression tests

LatLon has been kind of fixed, but there are also potentially integers; ints are being converted to floats and that marks them as updated cells when they shouldn't be.

image

image

Also, I tried out COs that use COTs with the AlphaNumByYear type, and batch edit flagged the correct format as invalid, and incorrect formats as valid. Is fixing this within the scope of the PR? Wanted to check since the same thing works fine in the Workbench.

The incorrect format is AAAAAA, which was the default catno format before I changed it for FunnyTurtles.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Migration Prs that contain migration
Projects
Status: Dev Attention Needed
9 participants