QA data loader for alembic testing #3273

heartsucker · 2018-04-16T15:25:20Z

Status

Ready for review

Description of Changes

Toward #3244

Adds a master dataloader that dumps data in to the database / storage dir to allow us to test migrations.

Testing

Spin up a staging machine and ./qa_loader.py

Checklist

If you made changes to the server application code:

Linting (make ci-lint) and tests (make -C securedrop test) pass in the development container

heartsucker · 2018-04-16T15:26:51Z

Note that this doesn't use factory_boy as suggested earlier because when i did it that way, it was honestly way more obtuse and hard to understand than just using a few hand-rolled functions.

codecov-io · 2018-04-16T17:16:44Z

Codecov Report

Merging #3273 into alembic will increase coverage by 0.17%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           alembic    #3273      +/-   ##
===========================================
+ Coverage    85.14%   85.32%   +0.17%     
===========================================
  Files           36       37       +1     
  Lines         2201     2330     +129     
  Branches       239      257      +18     
===========================================
+ Hits          1874     1988     +114     
- Misses         269      283      +14     
- Partials        58       59       +1

Impacted Files	Coverage Δ
qa_loader.py	`88.37% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5c04c22...136df17. Read the comment docs.

redshiftzero

Hey @heartsucker - I did some testing and investigating in staging VMs - dropping a few comments inline. Note that I haven't yet been able to get a clean upload of data, but I'll return to this after a round of 0.7.0 QA and resume where I left off

redshiftzero · 2018-04-26T22:01:34Z

securedrop/qa_loader.py

+
+def new_journalist():
+    # Make a diceware-like password
+    pw = ' '.join([random_chars(3, nullable=False) for _ in range(7)])


So, since the random characters here can be \n or \t, we can get something like:

password = '>e, \n\n\t ]ec olq <um v~_ xl)'

which has len(password.split()) < 7 and thus will raise a NonDicewarePassword exception.

Instead, you could pick from this list of characters such that no NonDicewarePassword exception is raised

redshiftzero · 2018-04-26T22:50:18Z

securedrop/qa_loader.py

+
+
+def new_source():
+    fid_len = random.randint(0, 32)


If we get a fid_len of 0, we will start getting IntegrityErrors since we have a unique constraint on filesystem_id

Note that we also want to avoid very small non-zero fid_len as there is a high probability of hitting IntegrityError for them also (I can attest for this random seed fid_len = random.randint(4, 32) worked no problem)

I included empty FID's because that's allowed by the DB and could (but very, very likely won't) cause problems during future migrations.

redshiftzero · 2018-04-26T23:07:27Z

securedrop/qa_loader.py

+        description='Loads data into the database for testing upgrades')
+    parser.add_argument('-m', '--multiplier', type=positive_int, default=100,
+                        help=('Factor to multiply the loaded data by '
+                              '(default 100)'))


redshiftzero · 2018-04-26T23:13:49Z

docs/development/database_migrations.rst

+cleanly in production-like instances, we have a helper script that is designed to load
+semi-randomized data into the database. You will need to modify the script ``qa_loader.py`` to
+include sample data. This sample data should intentionally include edge cases that might behave
+strangely such as data whose nullability is only enforced by the application or missing files.


For release managers and release testers, let's add a bit of docs saying:

Provision staging or prod VMs.

sudo su

cd /var/www/securedrop

./qa_loader.py

redshiftzero · 2018-04-26T23:31:33Z

securedrop/qa_loader.py

+
+    filename = random_chars(20, nullable=False, chars=string.ascii_lowercase)
+    filename = '1-' + filename + '-msg.gpg'
+    f_len = random.randint(1, 1024*1024*500)


I see why you're doing this, but sadly this produces a bunch of very large files and my test VMs ran out of space very rapidly 😇.

What about reducing the number of files that have very large sizes? e.g. by drawing from a steep exponential distribution int(math.floor(random.expovariate(10.0) * 1024 * 1024 * 500))? We'll get mostly smaller files but (with low probability) a few bigger ones thrown in - this should significantly reduce disk space needed and still be a realistic test (note: I haven't tested this yet and it likely needs tweaking)

heartsucker · 2018-04-27T09:46:53Z

@redshiftzero Can you review this locally? I'm getting merge conflicts with develop during CI, and I'd rather just deal with that rebase once on the alembic branch.

redshiftzero

I was able to run through this successfully in staging VMs on top of the commits I added, so this is shaping up nicely. Details of my changes are in the commit messages, let me know if you disagree with any of them - the premise to my modifications here is that a person doing QA should be able to both smoothly run the QA loader and also be able to access the journalist interface and click around and not see any 500s. If you haven't done that with this QA data, you definitely should prior to us merging this as there are a couple of other things that may seem odd to testers, e.g. the random journalist designations, take a look at let me know what you think. It's important we resolve this now because we'll otherwise have to resolve it during the QA period if something is confusing.

There's one last case to handle (#1189), there is a comment inline about that, fortunately I think there is a nice path forward there 😄

redshiftzero · 2018-05-22T00:01:44Z

docs/development/database_migrations.rst

+3. Provision staging VMs
+4. ``vagrant ssh app-staging``
+5. ``sudo su``
+6. ``cd /var/www/securedrop && ./qa-loader.py``


Nit: ./qa-loader.py -> ./qa_loader.py

redshiftzero · 2018-05-23T05:43:50Z

securedrop/qa_loader.py

+                         SOURCE_COUNT * multiplier + multiplier):
+            new_abandoned_submission(config, sid)
+
+        # TODO add submissions without sources for #1189


I think we do want this - new_abandoned_submission almost handles this situation, it looks like we just need to delete the source(s) that the abandoned submissions are associated with. Using raw SQL to do the deletion should do the trick since the submission cascade deletion is enforced at the ORM level.

To do so, they must have pending set to False (by default it is True, and such sources will not appear in the journalist interface), as well as a filesystem_id that is not None. Any source that has a filesystem_id of None will cause a 500 server error, so this is a reasonable assumption to make for existing instances.

Since we generate a fake file for replies and submissions, we should limit the number of replies generated per iteration to 3 (previously it was 100). In addition, we need to use a very steep file size distribution, or the total size of the files generated is unwieldly (i.e. fills up the staging VM).

redshiftzero

Latest changes look good! I added one commit (ca0ffb5) to simplify the dangling submissions case a bit. I verified this dangling submissions case via models.db.session.query(models.Submission).filter_by(source_id=None).all() to see the submissions are present as expected.

Approved, and I'll let you do the merge honors here since the feature branch is not protected and I made the last change.

heartsucker · 2018-05-24T07:17:05Z

I just noticed in the UI that you can't add emoji/reactions to a review. Only commends. So.

👍👍👍👍👍👍👍👍👍👍👍👍👍👍👍

heartsucker requested a review from conorsch as a code owner April 16, 2018 15:25

heartsucker requested a review from a user April 16, 2018 15:25

heartsucker requested review from msheiny and redshiftzero as code owners April 16, 2018 15:25

heartsucker force-pushed the qa-data-loader branch from 20a8808 to 6e17498 Compare April 16, 2018 15:25

heartsucker force-pushed the qa-data-loader branch from d1a57e3 to 136df17 Compare April 16, 2018 16:50

heartsucker mentioned this pull request Apr 17, 2018

Use alembic to version control sqlite database #3211

Merged

3 tasks

redshiftzero suggested changes Apr 26, 2018

View reviewed changes

heartsucker added 3 commits April 27, 2018 10:57

factor out acceptable diceware characters

e6e7fe3

adds loader for dummy data for QA

ba45105

added qa_loader.py to rsync filter

554d21f

heartsucker force-pushed the qa-data-loader branch from 136df17 to 67c6c14 Compare April 27, 2018 09:04

redshiftzero self-assigned this May 21, 2018

redshiftzero suggested changes May 23, 2018

View reviewed changes

heartsucker and others added 5 commits May 23, 2018 16:01

updated docs to explain qa_loader.py

5e5f9b1

fix mypy error

042d246

remove sources to create dangling sumbissions

820ac1b

heartsucker force-pushed the qa-data-loader branch from a42b5d2 to 820ac1b Compare May 23, 2018 15:23

heartsucker and others added 3 commits May 23, 2018 21:10

add both message and docs as submissions

8752b47

add message about slowness to qa loader

c574a59

QA data loader: Simplify dangling submissions case

ca0ffb5

redshiftzero approved these changes May 24, 2018

View reviewed changes

heartsucker merged commit 5689235 into alembic May 24, 2018

heartsucker deleted the qa-data-loader branch May 24, 2018 07:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA data loader for alembic testing #3273

QA data loader for alembic testing #3273

heartsucker commented Apr 16, 2018 •

edited

Loading

heartsucker commented Apr 16, 2018

codecov-io commented Apr 16, 2018 •

edited

Loading

redshiftzero left a comment

redshiftzero Apr 26, 2018

redshiftzero Apr 26, 2018

redshiftzero Apr 26, 2018

heartsucker Apr 27, 2018

redshiftzero Apr 26, 2018

redshiftzero Apr 26, 2018

redshiftzero Apr 26, 2018

heartsucker commented Apr 27, 2018

redshiftzero left a comment

redshiftzero May 22, 2018

redshiftzero May 23, 2018

redshiftzero left a comment

heartsucker commented May 24, 2018

QA data loader for alembic testing #3273

QA data loader for alembic testing #3273

Conversation

heartsucker commented Apr 16, 2018 • edited Loading

Status

Description of Changes

Testing

Checklist

If you made changes to the server application code:

heartsucker commented Apr 16, 2018

codecov-io commented Apr 16, 2018 • edited Loading

Codecov Report

redshiftzero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

heartsucker commented Apr 27, 2018

redshiftzero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

redshiftzero left a comment

Choose a reason for hiding this comment

heartsucker commented May 24, 2018

heartsucker commented Apr 16, 2018 •

edited

Loading

codecov-io commented Apr 16, 2018 •

edited

Loading