Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test harness to ensure alembic up/downgrades work as intended #3244

Closed
heartsucker opened this issue Apr 7, 2018 · 7 comments · Fixed by #3211
Closed

Add test harness to ensure alembic up/downgrades work as intended #3244

heartsucker opened this issue Apr 7, 2018 · 7 comments · Fixed by #3211
Assignees

Comments

@heartsucker
Copy link
Contributor

heartsucker commented Apr 7, 2018

Feature request

Description

Subtask of #1419

We need a test framework that ensures that every alembic database upgrade and downgrade applies cleanly.

As a note, whatever we do to load data into the database cannot depend on models.py because the database schema will always match alembic's head revision, so using it to test an upgrade makes no sense since we need to use the out-of-date models. This means we have to hand craft code to do the inserts and selects that we want.

Initial Proposal

Branch: migration-test-harness

This proposal uses dynamic module loading and pytest parametrization to ensure that every new migration applies the tests in the same way. If a migration is not present, running make test will error out because it won't be able to find the correct module.

  • Test harness:
    @pytest.mark.parametrize('migration', ALL_MIGRATIONS)
    def test_upgrade_with_data(alembic_config, config, migration):
    migrations = list_migrations(alembic_config, migration)
    if len(migrations) == 1:
    # Degenerate case where there is no data for the first migration
    return
    # Upgrade to one migration before the target
    target = migrations[-1]
    upgrade(alembic_config, target)
    # Dynamic module import
    name = 'migration_{}'.format(migration)
    mod_name = 'tests.migrations.{}'.format(name)
    mod = __import__(mod_name, fromlist=['UpgradeTester'])
    # Load the test data
    upgrade_tester = mod.UpgradeTester(config=config)
    upgrade_tester.load_data()
    # Upgrade to the target
    upgrade(alembic_config, migration)
    # Make sure it applied "cleanly" for some definition of clean
    upgrade_tester.check_upgrade()
    @pytest.mark.parametrize('migration', ALL_MIGRATIONS)
    def test_downgrade_with_data(alembic_config, config, migration):
    migrations = list_migrations(alembic_config, migration)
    if len(migrations) == 1:
    # Degenerate case where there is no data for the first migration
    return
    # Upgrade to the target
    upgrade(alembic_config, migration)
    # Dynamic module import
    name = 'migration_{}'.format(migration)
    mod_name = 'tests.migrations.{}'.format(name)
    mod = __import__(mod_name, fromlist=['DowngradeTester'])
    # Load the test data
    downgrade_tester = mod.DowngradeTester(config=config)
    downgrade_tester.load_data()
    # Downgrade to previous migration
    downgrade(alembic_config, '-1')
    # Make sure it applied "cleanly" for some definition of clean
    downgrade_tester.check_downgrade()
  • Migration tester: https://github.com/freedomofpress/securedrop/blob/cebd70fbccf866783b67406efba6023661f1b363/securedrop/tests/migrations/migration_faac8092c123.py

User Stories

As a dev, I want to know that my DB migration scripts work in a prod-like environment.

@heartsucker
Copy link
Contributor Author

Tagging @conorsch and @redshiftzero for review of this proposal before I completely implement it.

@redshiftzero
Copy link
Contributor

Here are some more detailed thoughts about what I think needs to be in place testing-wise prior to shipping database migrations. The testing story here could balloon into a pretty significant task, so first here's the possible approaches I have in my mind ordered from least to most effort:

  1. Pre-release manual testing of database migrations ("manual" here being - we manually load in a bunch of submissions - this is insufficient as it is too time-consuming)
  2. Pre-release testing of database migrations using a script to load a large quantity of data (better) - in my opinion this is what we need at minimum in place to ship database migrations for SecureDrop (thoughts welcome if anyone disagrees)
  3. Automated testing (i.e. running in CI on each PR) of upgrades/downgrades/upgrades of the latest migration applying cleanly without realistic data (even better, helps in reviewing PRs)
  4. Automated testing (i.e. running in CI on each PR) of upgrades/downgrades of the latest migrations - and verifying that the data migration occurred safely (which Alembic is not going to take care of for us automatically). This is tricky to do elegantly because the asserts should really be that the data is present and unchanged in the expected columns (I'm not aware of a project that is doing this in an automated fashion...). The motivation here would be to provide a safety net for autogenerated Alembic migrations that are destructive for some schema changes (e.g. an attempt to rename a column will be an ADD/DROP with all data destroyed in the autogenerated migration). That said, this is the kind of thing that we should be catching in PR review.

So if we took option 2 and implement a simple data script, that uses the state of the models on the latest release to load the data (this script we would need to update for pre-release testing for any release involving a migration), the story around testing/release would look something like this:

Pre-release testing for release N

  1. QA participant provisions VMs on release N-1.
  2. QA participant uses data import script (to be written*) on app server for release N to fill the database with realistic data.
  3. QA participant upgrades VMs to version N using test debian packages.
  4. QA participant verifies that no failures occurred: they can still log in successfully as both a source and as a journalist, the same number of sources and submissions are present in the journalist interface, etc.

If we know that a particularly tricky migration step is occurring in a particular release, then we can include additional checks verifying the data integrity in that table, etc. as part of the pre-release test plan.

Per-PR testing for PRs with migrations

Reviewers would inspect migrations added in PRs to ensure that:

  1. There is a migration written.

  2. Schema migrations have downgrades implemented and they apply cleanly (test via upgrade, downgrade, upgrade again) -> This part it would be excellent to automate with what you have in your branch. For a first pass this does not strictly need to involve data - see my comment here.

  3. Schema migrations that are potentially destructive (i.e. those that are not just adding a new column or table) have an appropriate data migration written.

tl;dr I claim the highest priority code we need for shipping database migrations is not an automated approach (indeed, even if we had an automated approach, we'd still want a realistic data import script for pre-release testing) is the realistic data loading script for pre-release upgrade testing.

@heartsucker
Copy link
Contributor Author

I should have fulled written out the functions in this example code instead of stubbing them with pass. It would be something like:

def load_data(self):
    with self.app.app_context():
        db.engine.execute(text('''INSERT INTO ...'''))
        self.loaded_ids = dict(...)
        ...

def check_upgrade(self):
    with self.app.app_context():
        for row in db.engine.execute(text('''SELECT id, username FROM ...''')):
            assert row['id'] in self.loaded_ids.keys()
            ...

This would be option 3 you described above, and I think it's absolutely necessary. Manual QA is also going to be useful, but was have cases the UI won't catch such as what happens when we enforce foreign keys but there could be a violation? Also, the data dump script would need to include printouts to the terminal telling the QA reviewer exactly what to look for (e.g., this source has this name and this many submissions and this many stars), and at that point we might as well just script it, no?

QA participant uses data import script (to be written*) on app server for release N to fill the database with realistic data.

If we did do this, we would just call the script load_test_data.py and it would only load data that matches the application at its current commit. If someone wants to load old data, they would have to checkout the repo at a previous commit to run it. The reason being is if we do something like change a library, add a library, or update code in the models, the script will either break or load "wrong" data rendering it useless. So the loader script isn't written for some target release N per se, but it's written for the state of the DB now.

This is also a problem because if we have a release N that has multiple migrations since N-1, we won't be able to see how the loader interacts with these more complex cases. It also means that we can't automate these tests so easily so a dev won't know if their migration is good or not until the QA period.

CI is super slow right now and building Debian packages is fairly slow. Having tests that can be run with pytest tests/test_alembic.py and give an answer in 30 seconds are preferable. Also, if an error is caught with your method during QA, the code/test cycle is going to be very slow and would also depend on using some amount of git stash and git stash pop to alter the dummy loader script at the old release's commit to change what data is loaded to better sort out the errors. I don't imagine this to be a good workflow.

I could add a test called test_sequential_migrations_with_data that starts at <base> then applies migration 1, loads data, applies migration 2, loads data... all the way to head. This would cover more cases than the script you suggested would.

My tl;dr is that a dummy data script is not going to catch enough edge cases and is going to be harder to maintain than a series of tests that are isolated to the scope of "a single migration."

@heartsucker
Copy link
Contributor Author

During a call with @redshiftzero we agreed that both the manual test via a data dump and the per-migration tests are a hard requirements for alembic migrations to ship.

Proposal for data dump

  • Large volume of submissions (hundreds) including documents on the file system and PGP keys
  • Large number of journalists / admins
  • Intentionally including edge cases such as users with strange characters in their names
  • Rows with "broken" foreign keys
    • Anywhere we expect an FK to be that isn't enforced by the DB, we add rows that violate the implic relationship
    • This might be ok to leave out if these are explicitly covered by per-migration tests

@redshiftzero
Copy link
Contributor

Most issues caused by the content in a column (e.g. special characters in a field) will only come up if we start doing tricky data migrations. In the next ~6 months of the project we will be likely only adding columns and tables and won't hit these issues. But, for the sake of future maintainers of the project I agree we should have a representative data dump script such that there are so surprises. Some specific thoughts:

  • One "broken" FK case that is important - for SecureDrop instances that had submissions prior to the release of SecureDrop 0.4, their databases will contain rows in the submissions table that have no corresponding source (good news: now we have a nice story for cleaning this up in a future revision) - (ref: Handle Submissions with no Sources in currently running instances. #1189)

  • In terms of submission quantity, if a SecureDrop has been running since 0.3, no one has deleted anything and the instance is getting a submission rate of 10 submissions per day (disclaimer: this rate is pulled out of a hat and is not based on any special knowledge that I have), then there would be ~12k submissions. For the sake of admins, I hope no one is doing this, but it is worth putting this submission quantity into the test database such that we are testing the upper limit.

  • (for data migrations) journalist_designation can include special characters: ', - (this one has caused me woes in the past... 😞 )

Note for the interested observer: an older wordlist was used up until 0.4 (ref: #1235) so there might be long-lived source codenames that are actually numbers from the old wordlist. But since we're not directly storing the codename this shouldn't be an issue.

@heartsucker
Copy link
Contributor Author

heartsucker commented Apr 12, 2018

@redshiftzero So there's something just a tiny bit about using factory_boy. We're going to run it on the staging server, but we don't want to add that as a dependency to the app code. If we put it in develop-requirements.{in,txt}, then it won't be on the staging server. This will require the release manager to copy the requirements file over, install it, then run the loader script. I can't think of a better workflow at the moment. Do you have any suggestions?

@redshiftzero
Copy link
Contributor

While I think a single manual step like this wouldn't be a terrible thing since it's for pre-release QA, recall that the app-test role is used in the staging environment, so we can add this dependency to the test-requirements and it will get installing in staging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants