-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added framework to test HTML reports #2283
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of special cases are there in extract/parsing html function here. can you please add more descriptive comments in the functions explaining "what and why" both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good mostly, one important comment regarding unique set of tags.
data = { | ||
"title": normalize_text(soup.title.string) if soup.title and soup.title.string else "No Title", | ||
"headings": extract_and_normalize_texts(soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6'])), | ||
"paragraphs": extract_paragraphs(soup), | ||
"tables": sort_table_data(soup.find_all("table")), | ||
"links": { | ||
k: v for k, v in sorted( | ||
{normalize_text(a.get("href") or ""): normalize_text(a.text) for a in soup.find_all("a")}.items() | ||
) | ||
}, | ||
"spans": extract_and_normalize_texts(soup.find_all("span")), | ||
"divs": extract_divs(soup) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we are checking for all these tags.
What is missing if there is some info in html(now or later in future) which is inside some tag apart from these...
Comparision of unique set of tags in both the html reports should fix that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a function compare_html_tags() which takes care of this. Added a screenshot in the description as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great @shubham-yb. A small nit, instead of File1 or File2 in output lets use expected file
or actual file
?
@shubham-yb Few guidelines for merging the PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just few final comments.
Good job on adding the framework and thanks for improving the code further.
def compare_html_tags(html_data1, html_data2): | ||
"""Compare the unique tags in the two HTML reports.""" | ||
|
||
def compare_html_reports(file1, file2): | ||
"""Compares two HTML reports and prints structured differences.""" | ||
with open(file1, "r", encoding="utf-8") as f1, open(file2, "r", encoding="utf-8") as f2: | ||
html_data1 = extract_html_data(f1.read()) | ||
html_data2 = extract_html_data(f2.read()) | ||
def get_unique_tags(html_content): | ||
"""Extracts all unique tag names from the given HTML content.""" | ||
soup = BeautifulSoup(html_content, 'html.parser') | ||
return {tag.name for tag in soup.find_all()} | ||
|
||
tags1 = get_unique_tags(html_data1) | ||
tags2 = get_unique_tags(html_data2) | ||
|
||
missing_tags_in_file1 = tags2 - tags1 # Tags in file2 but missing in file1 | ||
missing_tags_in_file2 = tags1 - tags2 # Tags in file1 but missing in file2 | ||
|
||
differences = {} | ||
|
||
if missing_tags_in_file1: | ||
differences["missing_tags_in_file1"] = "\n".join(missing_tags_in_file1) | ||
if missing_tags_in_file2: | ||
differences["missing_tags_in_file2"] = "\n".join(missing_tags_in_file2) | ||
|
||
return differences |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One high level comment - what we(consumer of the script) interested to know is the difference between expected and actual report.
Assuming here, file1 - expected, file2- actual
Wording the output from the perspective of file2(actual) will be easier to read.
can we display in output like:
Missing tags in actual report:
...
Extra/new tags in actual report:
@@ -129,44 +133,85 @@ def generate_diff_list(list1, list2, section_name, file1_path, file2_path): | |||
def dict_to_list(dict_data): | |||
"""Convert dictionary to list of formatted strings.""" | |||
return [f"{k} -> {v}" for k, v in dict_data.items()] | |||
|
|||
def compare_html_tags(html_data1, html_data2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing unique tags is one.
How about one enhancement - also match the counts for each tags(number of times each tag appeared)
I guess it should be small change, lib might be providing that.
Describe the changes in this pull request
Added the Python script to compare 2 HTML reports.
Added the expected files for PG and Oracle Assessment tests.
Note:
The HTML diffs might not look very readable at times. The idea is to bring up the fact that something has changed in the HTML report as well so that the developer is aware.
Describe if there are any user-facing changes
How was this pull request tested?
Does your PR have changes that can cause upgrade issues?
Sample output when the reports are matching:
Sample output if the migration complexity changes:
Sample output if there are mismatches in tags: