[REF] Compute_plausible_gaps, Efficiency, Stability #243

bosd · 2024-10-29T20:08:07Z

Use of get Method: When retrieving the best alignment, we use self._textline_to_alignments.get(most_aligned_tl) instead of direct indexing. This prevents a potential KeyError if most_aligned_tl is not in the dictionary, which could lead to unexpected behavior.
Early Exit Conditions: We explicitly check if best_alignment is None after attempting to retrieve it. This ensures that we do not proceed with calculations if the alignment data is missing.
Sorting and Gap Calculation: I retained the logic to sort the text lines and calculate gaps. This part of the code is straightforward and unlikely to lead to an infinite loop as long as the input lists are correctly managed.
Returning None for Insufficient Data: The checks for the lengths of the text line lists ensure that we only proceed if there are enough lines to compute meaningful gaps. If there are not enough lines, we return None to avoid further computation.
List Comprehensions for Gap Calculation: The gap calculations for horizontal and vertical gaps are done using list comprehensions, which are more concise and Pythonic, making the code cleaner.

1. **Use of `get` Method**: When retrieving the best alignment, we use `self._textline_to_alignments.get(most_aligned_tl)` instead of direct indexing. This prevents a potential `KeyError` if `most_aligned_tl` is not in the dictionary, which could lead to unexpected behavior. 2. **Early Exit Conditions**: We explicitly check if `best_alignment` is `None` after attempting to retrieve it. This ensures that we do not proceed with calculations if the alignment data is missing. 3. **Sorting and Gap Calculation**: I retained the logic to sort the text lines and calculate gaps. This part of the code is straightforward and unlikely to lead to an infinite loop as long as the input lists are correctly managed. 4. **Returning `None` for Insufficient Data**: The checks for the lengths of the text line lists ensure that we only proceed if there are enough lines to compute meaningful gaps. If there are not enough lines, we return `None` to avoid further computation. 5. **List Comprehensions for Gap Calculation**: The gap calculations for horizontal and vertical gaps are done using list comprehensions, which are more concise and Pythonic, making the code cleaner.

1. **Sorting without Reverse**: When sorting the textlines, we sort them in ascending order directly. This avoids the need to reverse the sorted list later, which can save some computational overhead. 2. **Array Creation for Gaps**: Instead of creating lists and then converting them, we directly create `numpy` arrays to store gaps. This allows us to utilize `numpy`'s efficient operations for subsequent calculations. 3. **Early Exits**: The checks for the lengths of `ref_h_textlines` and `ref_v_textlines` provide early exits if not enough textlines are available, preventing unnecessary calculations. 4. **Percentile Calculation**: The percentile calculation remains unchanged, but we ensure that we are working with `numpy` arrays for performance.

bosd added performance Performance refactoring Refactoring labels Oct 29, 2024

bosd force-pushed the ref-netw-comp-plaus-gaps branch 2 times, most recently from 5af9091 to 628a94d Compare October 31, 2024 20:49

bosd added 2 commits October 31, 2024 21:56

bosd force-pushed the ref-netw-comp-plaus-gaps branch from 628a94d to 5d48841 Compare October 31, 2024 20:56

bosd merged commit ad1babd into py-pdf:main Oct 31, 2024
14 checks passed

bosd deleted the ref-netw-comp-plaus-gaps branch October 31, 2024 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REF] Compute_plausible_gaps, Efficiency, Stability #243

[REF] Compute_plausible_gaps, Efficiency, Stability #243

bosd commented Oct 29, 2024

[REF] Compute_plausible_gaps, Efficiency, Stability #243

[REF] Compute_plausible_gaps, Efficiency, Stability #243

Conversation

bosd commented Oct 29, 2024