Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement complete pipeline #19

Closed
philswatton opened this issue Jan 30, 2023 · 8 comments
Closed

Implement complete pipeline #19

philswatton opened this issue Jan 30, 2023 · 8 comments
Assignees

Comments

@philswatton
Copy link
Contributor

Implement a pipeline that does the following:

  • Takes some configuration as input (e.g. seed, drop percentages, transforms, etc)
  • Computes similarity metrics
  • Trains networks on both models
  • Trains transfer attacks (on both?)
  • Computes attack success metrics
  • Stores metrics computed

It may be the case that portions of this pipeline are best separated from one another (or are in partially separate pipelines). For example, we will probably want to be training networks and attacks while implementing the similarity metrics

This was referenced Feb 15, 2023
@philswatton philswatton self-assigned this Feb 16, 2023
@philswatton
Copy link
Contributor Author

A general note for how we want this to work:

  • Several configs for different components of the experiment
    • Dataset config
      • Controls alterations between A and B, and input seed
      • We'll need to specify many datasets, while creating some (3-5?) with the same configuration but with a different seed
      • Open question: do we take sets of options (dropping, transform1, transform2) and extract combinations of alterations, or should we specify the experiments we want to perform in advance?
      • The datasets will be created separately in the similarity measurement pipeline and in the model + attack pipeline
    • Metric config
      • Which similarity measures we want to use
      • Initially constant across datasets. If we start looking into e.g. different labels or different datasets, we'll need to revisit that constancy (as some measures will no longer be appropriate or will only be appropriate under certain conditions - e.g. PAD requires same features, OT + OTDD will require gromov-wasserstein instead of wasserstein distance if features are not the same)
    • Model config
      • Unsure if required, but if we start relaxing the extent to which models are the same we'll need to look into this
  • We'll need functions for handling the configs above
  • We'll also need two scripts to handle everything:
    • A metrics script, that produces the datasets, computes the distance measures, and stores the results
    • A models script, that produces the datasets, farms out model training for each A and B dataset, then transfers an attack from A to B and from B to A, computes the attack success metrics, and stores the results
  • Since we'll have two sets of results stored, we'll also need to make sure we can join up the two results in a third script

This was referenced Feb 23, 2023
@philswatton
Copy link
Contributor Author

With #29 this is mostly done. The main two tasks remaining are:

@philswatton philswatton mentioned this issue Mar 3, 2023
@lannelin
Copy link
Contributor

[from meeting]
we are running training on HPC
where are we running metric calculations?
where are we running attacks?

@philswatton
Copy link
Contributor Author

#17 and #18 are now done. #40 has been opened to deal with adding attack scripts to pipeline. We also want to work out where on the pipeline we are doing similarity metrics (as above)

@philswatton
Copy link
Contributor Author

#38 is now done, meaning we're free to start doing experiment groups with transforms.

Still to go is:

  • Adding transfer attacks to HPC Add Transfer Attacks to HPC #40
  • Working out where to compute the similarity metrics (and possibly whether we should also log them to wandb)

Not pipeline but also necessary:

Not required for a full pipeline (and thus not required for this PR) but relevant to having a full pipline is optimising the training regime #26

@philswatton
Copy link
Contributor Author

Metris calculation location opened as #42

@philswatton
Copy link
Contributor Author

With #40 done, #42 is the last piece of pipeline work to be done. Will be looking at #26 before that

@philswatton
Copy link
Contributor Author

With #42 done, this PR is now finished. I've opened #57 to cover the need to actually use the pipeline to produce the final results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants