Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the test scripts for resumption #2117

Merged
merged 27 commits into from
Jan 22, 2025
Merged

Added the test scripts for resumption #2117

merged 27 commits into from
Jan 22, 2025

Conversation

shubham-yb
Copy link
Contributor

@shubham-yb shubham-yb commented Dec 25, 2024

Describe the changes in this pull request

  • Added the test framework for resumption tests for import data file and offline import data
  • Added the test cases of large sized table and large number of tables for import data file
  • Added the test case for PG offline import data resumption with datatypes, indexes, partitions, case sensitivity / reserved words, multiple schemas

Describe if there are any user-facing changes

N/A

How was this pull request tested?

Made the changes to the Jenkins pipeline as well.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@shubham-yb shubham-yb marked this pull request as ready for review December 27, 2024 11:45
@shubham-yb
Copy link
Contributor Author

Does your PR have changes that can cause upgrade issues?

Component Breaking changes?
MetaDB No
Name registry json No
Data File Descriptor Json No
Export Snapshot Status Json No
Import Data State No
Export Status Json No
Data .sql files of tables No
Export and import data queue No
Schema Dump No
AssessmentDB No
Sizing DB No
Migration Assessment Report Json No
Callhome Json No
YugabyteD Tables No
TargetDB Metadata Tables No

@shubham-yb
Copy link
Contributor Author

"""
Runs the yb-voyager command with support for resumption testing.
"""
for attempt in range(1, resumption['max_restarts'] + 1):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's get/define all the configs in the beginning. It will make it easier to understand what all configuration options are involved.

max_restarts = resumption['max_restarts']
min_interrupt_seconds = resumption['min_interrupt_seconds']
... 

if not output: # Exit if output is empty (end of process output)
break
full_output += output
if time.time() - start_time > 5:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why break ? what is 5? seconds? minutes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was written so that we get the output in realtime. Turning back it isn't much helpful in case of automation. Have changed this implementation.

# Final import retry logic
print("\n--- Final attempt to complete the import ---")

for _ in range(2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2 attempts finally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two attempts and the sleep were to avoid any intermittent issues or system overload. Have removed it.

try:
print("\nVoyager command output:")

process = subprocess.Popen(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: separate function for starting command (can be called in above for-loop as well)

)

# Capture and print output
for line in iter(process.stdout.readline, ''):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the above for-loop, we're reading both stderr and stdout, here we're only reading stdout. Any particular reason? Would be good to be consistent here (call a common function that captures stdout/stderr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also till when will you keep reading? How long will the loop run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loop was to be run till the command exits and print the output in realtime.
Implemented a separate function for running the command and capturing stdout and stderr.

for line in iter(process.stderr.readline, ''):
print(line.strip())
sys.stdout.flush()
time.sleep(30)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why sleep?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was added to avoid intermittent failures / system overload. Have removed it.

print("Final import failed after 2 attempts.")
sys.exit(1)

def validate_row_counts(row_count, export_dir):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for future: you can create a common python file that has such helper
functions.

@@ -0,0 +1,133 @@
#!/bin/bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that the ONLY change here is that you're specifying ROW_COUNT and essentially making generate_series dynamic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct

schema2.Case_Sensitive_Table: 5000000
schema2.case: 5000000
schema2.Table: 5000000
public.boston: 2500000
Copy link
Collaborator

@makalaaneesh makalaaneesh Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is the code that generates data for all these other tables boston/cust/emp/etc? I only see code for table/case/Case_Sensitive_Table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are being done via the partitions test schema / data

Copy link
Collaborator

@makalaaneesh makalaaneesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a minor comment

try:
process.terminate()
process.wait(timeout=10)
except subprocess.TimeoutExpired:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put some logs/prints to determine if we terminated the process or killed it

@shubham-yb
Copy link
Contributor Author

@shubham-yb shubham-yb merged commit a6dd701 into main Jan 22, 2025
66 of 67 checks passed
@shubham-yb shubham-yb deleted the shubham/resumption branch January 22, 2025 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants