-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QA data loader for alembic testing #3273
Conversation
20a8808
to
6e17498
Compare
Note that this doesn't use |
d1a57e3
to
136df17
Compare
Codecov Report
@@ Coverage Diff @@
## alembic #3273 +/- ##
===========================================
+ Coverage 85.14% 85.32% +0.17%
===========================================
Files 36 37 +1
Lines 2201 2330 +129
Branches 239 257 +18
===========================================
+ Hits 1874 1988 +114
- Misses 269 283 +14
- Partials 58 59 +1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @heartsucker - I did some testing and investigating in staging VMs - dropping a few comments inline. Note that I haven't yet been able to get a clean upload of data, but I'll return to this after a round of 0.7.0 QA and resume where I left off
securedrop/qa_loader.py
Outdated
|
||
def new_journalist(): | ||
# Make a diceware-like password | ||
pw = ' '.join([random_chars(3, nullable=False) for _ in range(7)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, since the random characters here can be \n
or \t
, we can get something like:
password = '>e, \n\n\t ]ec olq <um v~_ xl)'
which has len(password.split()) < 7
and thus will raise a NonDicewarePassword
exception.
Instead, you could pick from this list of characters such that no NonDicewarePassword
exception is raised
securedrop/qa_loader.py
Outdated
|
||
|
||
def new_source(): | ||
fid_len = random.randint(0, 32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get a fid_len
of 0, we will start getting IntegrityError
s since we have a unique constraint on filesystem_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we also want to avoid very small non-zero fid_len
as there is a high probability of hitting IntegrityError
for them also (I can attest for this random seed fid_len = random.randint(4, 32)
worked no problem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included empty FID's because that's allowed by the DB and could (but very, very likely won't) cause problems during future migrations.
description='Loads data into the database for testing upgrades') | ||
parser.add_argument('-m', '--multiplier', type=positive_int, default=100, | ||
help=('Factor to multiply the loaded data by ' | ||
'(default 100)')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sweet!
cleanly in production-like instances, we have a helper script that is designed to load | ||
semi-randomized data into the database. You will need to modify the script ``qa_loader.py`` to | ||
include sample data. This sample data should intentionally include edge cases that might behave | ||
strangely such as data whose nullability is only enforced by the application or missing files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For release managers and release testers, let's add a bit of docs saying:
- Provision staging or prod VMs.
sudo su
cd /var/www/securedrop
./qa_loader.py
securedrop/qa_loader.py
Outdated
|
||
filename = random_chars(20, nullable=False, chars=string.ascii_lowercase) | ||
filename = '1-' + filename + '-msg.gpg' | ||
f_len = random.randint(1, 1024*1024*500) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see why you're doing this, but sadly this produces a bunch of very large files and my test VMs ran out of space very rapidly 😇.
What about reducing the number of files that have very large sizes? e.g. by drawing from a steep exponential distribution int(math.floor(random.expovariate(10.0) * 1024 * 1024 * 500))
? We'll get mostly smaller files but (with low probability) a few bigger ones thrown in - this should significantly reduce disk space needed and still be a realistic test (note: I haven't tested this yet and it likely needs tweaking)
136df17
to
67c6c14
Compare
@redshiftzero Can you review this locally? I'm getting merge conflicts with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to run through this successfully in staging VMs on top of the commits I added, so this is shaping up nicely. Details of my changes are in the commit messages, let me know if you disagree with any of them - the premise to my modifications here is that a person doing QA should be able to both smoothly run the QA loader and also be able to access the journalist interface and click around and not see any 500s. If you haven't done that with this QA data, you definitely should prior to us merging this as there are a couple of other things that may seem odd to testers, e.g. the random journalist designations, take a look at let me know what you think. It's important we resolve this now because we'll otherwise have to resolve it during the QA period if something is confusing.
There's one last case to handle (#1189), there is a comment inline about that, fortunately I think there is a nice path forward there 😄
3. Provision staging VMs | ||
4. ``vagrant ssh app-staging`` | ||
5. ``sudo su`` | ||
6. ``cd /var/www/securedrop && ./qa-loader.py`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: ./qa-loader.py
-> ./qa_loader.py
securedrop/qa_loader.py
Outdated
SOURCE_COUNT * multiplier + multiplier): | ||
new_abandoned_submission(config, sid) | ||
|
||
# TODO add submissions without sources for #1189 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do want this - new_abandoned_submission
almost handles this situation, it looks like we just need to delete the source(s) that the abandoned submissions are associated with. Using raw SQL to do the deletion should do the trick since the submission cascade deletion is enforced at the ORM level.
To do so, they must have pending set to False (by default it is True, and such sources will not appear in the journalist interface), as well as a filesystem_id that is not None. Any source that has a filesystem_id of None will cause a 500 server error, so this is a reasonable assumption to make for existing instances.
Since we generate a fake file for replies and submissions, we should limit the number of replies generated per iteration to 3 (previously it was 100). In addition, we need to use a very steep file size distribution, or the total size of the files generated is unwieldly (i.e. fills up the staging VM).
a42b5d2
to
820ac1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes look good! I added one commit (ca0ffb5) to simplify the dangling submissions case a bit. I verified this dangling submissions case via models.db.session.query(models.Submission).filter_by(source_id=None).all()
to see the submissions are present as expected.
Approved, and I'll let you do the merge honors here since the feature branch is not protected and I made the last change.
I just noticed in the UI that you can't add emoji/reactions to a review. Only commends. So. 👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍 |
Status
Ready for review
Description of Changes
Toward #3244
Adds a master dataloader that dumps data in to the database / storage dir to allow us to test migrations.
Testing
Spin up a staging machine and
./qa_loader.py
Checklist
If you made changes to the server application code:
make ci-lint
) and tests (make -C securedrop test
) pass in the development container