Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revit_Toolkit: remove whitespaces when matching names on Push #574

Open
pawelbaran opened this issue Mar 3, 2020 · 11 comments
Open

Revit_Toolkit: remove whitespaces when matching names on Push #574

pawelbaran opened this issue Mar 3, 2020 · 11 comments
Assignees
Labels
type:bug Error or unexpected behaviour

Comments

@pawelbaran
Copy link
Member

Description:

Atm the BHoM properties are being matched with Revit types based on names - sometimes this does not work due to whitespaces on either side (e.g. HEB200 vs HEB 200). Would be good to ignore these.

@pawelbaran pawelbaran added the type:bug Error or unexpected behaviour label Mar 3, 2020
@pawelbaran pawelbaran added this to the BHoM 3.1 β RC milestone Mar 3, 2020
@pawelbaran pawelbaran self-assigned this Mar 3, 2020
@pawelbaran
Copy link
Member Author

After a quick glance, this looks like a more convoluted problem than expected. Therefore pushing to 3.2, to be resolved together with #582

@pawelbaran
Copy link
Member Author

I could resolve it now already, but I started thinking of some more intelligent name matching than simply removal of whitespaces: it would be great if HEB200 could match not only with HEB 200 but also HE200B etc. I am curious if it would not be worth implementing a more intelligent string matching mechanism in general in order to handle typos or minor mismatches - this could be useful in other toolkits, but also e.g. in the method search.

What do you think @al-fisher @IsakNaslundBh @FraserGreenroyd?

@FraserGreenroyd
Copy link
Contributor

At what point is HEB200 == HE200B? That's more than a mismatch to me, and should not be the responsibility of code to fix. The problem we have is if the toolkit looks at something and thinks "Oh, I can fix that", and does so, the user risks getting a result they didn't intend, but everything looks to be working fine.

I would agree with removing spaces at most. But I wouldn't agree with any other changes to the string - I would rather error out to the user and let them fix it to make sure they're getting the right workflow, and not the workflow we think might be right.

@pawelbaran
Copy link
Member Author

HEB200 and HE200B are 2 commonly used names for the same thing. My gut feeling is we can find many more of such.

@al-fisher
Copy link
Member

al-fisher commented Nov 23, 2020

Yes agreed this is worth putting some thought into generalising. I think the key here is to not "hard code" the matching assumptions, as for given work flows it will be really valuable to be able to override, add to or customise - what matches to what.

I think we'll ultimately need a specific option for string comparing that allows user input.
Not dissimilar to wildcard and regex work @alelom has recently been looking at for the file adapter as aswells as the configs for diffing work.

In fact think this is effectively a Comparer Config specific to String comparison. @alelom @pawelbaran
This would allow perhaps simple things like "ignore whitespace" as well as "allow character permutations"
As well as more complex look ups such as typos/alternate spellings and synonyms in the future - based perhaps on datasets that the user can replace etc.

We can then create very simple standard configs (combinations of settings) and/or datasets of common strings that are equivalent - to help the most common workflows

@pawelbaran
Copy link
Member Author

This sounds like a Milestone workshop to me, to get others' thoughts too?

@FraserGreenroyd
Copy link
Contributor

This sounds like a Milestone workshop to me, to get others' thoughts too?

Agreed

@al-fisher
Copy link
Member

Sounds good

@IsakNaslundBh
Copy link
Contributor

Agree with all the above. Also links in to BHoM/BHoM_Datasets#60 which is another place for the exact same issue of sections having slightly different names in slightly different context.

There I had some idea of some hard-coded alternatives stored on the sections, but if we can fix it with some more cleaver string comparison matching, that would be even better.

@pawelbaran pawelbaran added this to the BHoM 4.1 β RC milestone Jan 11, 2021
@pawelbaran pawelbaran modified the milestone: BHoM 4.1 β RC Mar 24, 2021
@pawelbaran pawelbaran removed this from the BHoM 4.1 β RC milestone Apr 1, 2021
@vietle-bh
Copy link
Contributor

This issue seems relevant to the recent discussion on fuzzy string matching!

https://github.com/BHoM/AGS_Toolkit/blob/ebcc28ff5232fcddff0380e939b45130a54feec2/AGS_Engine/Compute/Ratios/FuzzyMatching.cs#L49

@pawelbaran
Copy link
Member Author

This issue seems relevant to the recent discussion on fuzzy string matching!

https://github.com/BHoM/AGS_Toolkit/blob/ebcc28ff5232fcddff0380e939b45130a54feec2/AGS_Engine/Compute/Ratios/FuzzyMatching.cs#L49

Love it, thanks @vietle-bh!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Error or unexpected behaviour
Projects
None yet
Development

No branches or pull requests

5 participants