Added framework to test HTML reports #2283

shubham-yb · 2025-02-03T14:32:34Z

Describe the changes in this pull request

Added the Python script to compare 2 HTML reports.
Added the expected files for PG and Oracle Assessment tests.

Note:
The HTML diffs might not look very readable at times. The idea is to bring up the fact that something has changed in the HTML report as well so that the developer is aware.

Describe if there are any user-facing changes

How was this pull request tested?

Does your PR have changes that can cause upgrade issues?

Component	Breaking changes?
MetaDB	Yes/No
Name registry json	Yes/No
Data File Descriptor Json	Yes/No
Export Snapshot Status Json	Yes/No
Import Data State	Yes/No
Export Status Json	Yes/No
Data .sql files of tables	Yes/No
Export and import data queue	Yes/No
Schema Dump	Yes/No
AssessmentDB	Yes/No
Sizing DB	Yes/No
Migration Assessment Report Json	Yes/No
Callhome Json	Yes/No
YugabyteD Tables	Yes/No
TargetDB Metadata Tables	Yes/No

Sample output when the reports are matching:

Sample output if the migration complexity changes:

Sample output if there are mismatches in tags:

CLAassistant · 2025-02-03T14:32:40Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

shubham-yb · 2025-02-04T05:58:38Z

https://jenkins.dev.yugabyte.com/job/users/job/yb-voyager-testing/job/yb-voyager-testing-pipeline-test/1103/

sanyamsinghal

A lot of special cases are there in extract/parsing html function here. can you please add more descriptive comments in the functions explaining "what and why" both.

migtests/scripts/compare-html-reports.py

.github/workflows/misc-migtests.yml

migtests/scripts/compare-html-reports.py

sanyamsinghal

Looks good mostly, one important comment regarding unique set of tags.

.github/workflows/misc-migtests.yml

migtests/scripts/compare-html-reports.py

sanyamsinghal · 2025-02-17T18:25:36Z

migtests/scripts/compare-html-reports.py

+    data = {
+        "title": normalize_text(soup.title.string) if soup.title and soup.title.string else "No Title",
+        "headings": extract_and_normalize_texts(soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])),
+        "paragraphs": extract_paragraphs(soup),
+        "tables": sort_table_data(soup.find_all("table")),
+        "links": {
+            k: v for k, v in sorted(
+                {normalize_text(a.get("href") or ""): normalize_text(a.text) for a in soup.find_all("a")}.items()
+            )
+        },
+        "spans": extract_and_normalize_texts(soup.find_all("span")),
+        "divs": extract_divs(soup)
+    }


I believe we are checking for all these tags.
What is missing if there is some info in html(now or later in future) which is inside some tag apart from these...

Comparision of unique set of tags in both the html reports should fix that?

Added a function compare_html_tags() which takes care of this. Added a screenshot in the description as well.

Great @shubham-yb. A small nit, instead of File1 or File2 in output lets use expected file or actual file ?

migtests/scripts/compare-html-reports.py

migtests/scripts/run-validate-assessment-report.sh

sanyamsinghal · 2025-02-17T18:32:02Z

@shubham-yb Few guidelines for merging the PR

Always Squash and Merge
Make sure you edit the final commit msg there, to make it precise and concise. (All the commits of that branch don't need to be there in the final commit msg)

sanyamsinghal

LGTM, just few final comments.

Good job on adding the framework and thanks for improving the code further.

sanyamsinghal · 2025-02-26T08:53:41Z

migtests/scripts/compare-html-reports.py

+def compare_html_tags(html_data1, html_data2):
+    """Compare the unique tags in the two HTML reports."""

-def compare_html_reports(file1, file2):
-    """Compares two HTML reports and prints structured differences."""
-    with open(file1, "r", encoding="utf-8") as f1, open(file2, "r", encoding="utf-8") as f2:
-        html_data1 = extract_html_data(f1.read())
-        html_data2 = extract_html_data(f2.read())
+    def get_unique_tags(html_content):
+        """Extracts all unique tag names from the given HTML content."""
+        soup = BeautifulSoup(html_content, 'html.parser')
+        return {tag.name for tag in soup.find_all()}
+
+    tags1 = get_unique_tags(html_data1)
+    tags2 = get_unique_tags(html_data2)
+
+    missing_tags_in_file1 = tags2 - tags1  # Tags in file2 but missing in file1
+    missing_tags_in_file2 = tags1 - tags2  # Tags in file1 but missing in file2
+
+    differences = {}
+
+    if missing_tags_in_file1:
+        differences["missing_tags_in_file1"] = "\n".join(missing_tags_in_file1)
+    if missing_tags_in_file2:
+        differences["missing_tags_in_file2"] = "\n".join(missing_tags_in_file2)
+
+    return differences


One high level comment - what we(consumer of the script) interested to know is the difference between expected and actual report.

Assuming here, file1 - expected, file2- actual
Wording the output from the perspective of file2(actual) will be easier to read.

can we display in output like:

Missing tags in actual report: ... Extra/new tags in actual report:

sanyamsinghal · 2025-02-26T08:57:07Z

migtests/scripts/compare-html-reports.py

@@ -129,44 +133,85 @@ def generate_diff_list(list1, list2, section_name, file1_path, file2_path):
 def dict_to_list(dict_data):
    """Convert dictionary to list of formatted strings."""
    return [f"{k} -> {v}" for k, v in dict_data.items()]
+
+def compare_html_tags(html_data1, html_data2):


Comparing unique tags is one.

How about one enhancement - also match the counts for each tags(number of times each tag appeared)
I guess it should be small change, lib might be providing that.

This reverts commit d7e2642.

shubham-yb added 2 commits February 3, 2025 14:30

Added Framework to test HTML reports

42a0a59

Merge branch 'main' into shubham/html-comparison

4aea59b

shubham-yb added 6 commits February 3, 2025 15:05

Install bs4 on GH Actions

bfea9c5

Install bs4 on GH Actions

c517f8c

Skip comparing database version

5a8c9f0

Added debug for Oracle report

d295bd8

Merge branch 'main' into shubham/html-comparison

738e628

Cleanup

e6dd2fc

shubham-yb requested review from priyanshi-yb, sanyamsinghal and makalaaneesh February 3, 2025 17:43

sanyamsinghal requested changes Feb 5, 2025

View reviewed changes

shubham-yb added 6 commits February 5, 2025 12:25

Addressed the review comments

8e689c4

Merge branch 'main' into shubham/html-comparison

d08f237

Fixed the script and the expected reports

58a897a

Merge branch 'main' into shubham/html-comparison

47db531

Updated the expected reports

cc71f10

Updated the expected reports

8f439a9

sanyamsinghal requested changes Feb 17, 2025

View reviewed changes

shubham-yb added 4 commits February 24, 2025 10:41

Merge branch 'main' into shubham/html-comparison

cc0b436

Incorporated review comments

4f825df

Merge branch 'main' into shubham/html-comparison

8ac8baa

Fixed fminor bug in div extraction and updated expected report

64d9577

sanyamsinghal approved these changes Feb 26, 2025

View reviewed changes

shubham-yb added 3 commits February 26, 2025 09:40

Added counter for tags

4f74e47

Test: remove extension if target is 2.25

d7e2642

Revert "Test: remove extension if target is 2.25"

4b3d2a8

This reverts commit d7e2642.

shubham-yb merged commit 1cd7f89 into main Feb 26, 2025
66 checks passed

shubham-yb deleted the shubham/html-comparison branch February 26, 2025 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added framework to test HTML reports #2283

Added framework to test HTML reports #2283

shubham-yb commented Feb 3, 2025 •

edited

Loading

CLAassistant commented Feb 3, 2025

shubham-yb commented Feb 4, 2025

sanyamsinghal left a comment

sanyamsinghal left a comment

sanyamsinghal Feb 17, 2025

shubham-yb Feb 25, 2025

sanyamsinghal Feb 26, 2025

sanyamsinghal commented Feb 17, 2025

sanyamsinghal left a comment

sanyamsinghal Feb 26, 2025

sanyamsinghal Feb 26, 2025

Added framework to test HTML reports #2283

Added framework to test HTML reports #2283

Conversation

shubham-yb commented Feb 3, 2025 • edited Loading

Describe the changes in this pull request

Describe if there are any user-facing changes

How was this pull request tested?

Does your PR have changes that can cause upgrade issues?

CLAassistant commented Feb 3, 2025

shubham-yb commented Feb 4, 2025

sanyamsinghal left a comment

Choose a reason for hiding this comment

sanyamsinghal left a comment

Choose a reason for hiding this comment

sanyamsinghal Feb 17, 2025

Choose a reason for hiding this comment

shubham-yb Feb 25, 2025

Choose a reason for hiding this comment

sanyamsinghal Feb 26, 2025

Choose a reason for hiding this comment

sanyamsinghal commented Feb 17, 2025

sanyamsinghal left a comment

Choose a reason for hiding this comment

sanyamsinghal Feb 26, 2025

Choose a reason for hiding this comment

sanyamsinghal Feb 26, 2025

Choose a reason for hiding this comment

shubham-yb commented Feb 3, 2025 •

edited

Loading