Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

343 streamline perf #344

Merged
merged 28 commits into from
Nov 19, 2024
Merged

343 streamline perf #344

merged 28 commits into from
Nov 19, 2024

Conversation

evaham1
Copy link
Collaborator

@evaham1 evaham1 commented Nov 13, 2024

No description provided.

@evaham1
Copy link
Collaborator Author

evaham1 commented Nov 13, 2024

perf.assess.mixo.plsda() and perf.assess.mixo.splsda()

Have created perf.assess.mixo.plsda() and exported to identical perf.assess.mixo.splsda(). These functions are essentially stripped down versions of perf.mixo.plsda()/per.mixo.splsda() in which instead of looping across components 1 to ncomp they only run performance assessment for ncomp.

Unit tests: were created to ensure that the new perf.assess() functions give exactly the same result as perf() for the same data, context and ncomp. To ensure this, set.seed() had to be added inside the component for loop in the perf() function. Unit tests also check for running in series and in parallel.

Plotting: no plots are made for perf.assess() if validation = loo or if validation = Mfold and nrep = 1. Otherwise a plot is made with just one point on the x axis as ncomp eg
Screenshot 2024-11-13 at 2 22 46 pm
Need to decide whether this is an informative plot and if want to make sure it works without repeats or if this doesn't matter and to just make a note that such plots can only be generated if repeated CV is done

@evaham1
Copy link
Collaborator Author

evaham1 commented Nov 15, 2024

perf.assess.mixo.pls() and perf.assess.mixo.spls()

Have created perf.assess.mixo.pls() and exported to identical perf.assess.mixo.spls(). These functions are essentially stripped down versions of perf.mixo.pls()/per.mixo.spls(). Ideally to improve runtime I would have removed any looping over component values from 1:ncomp however after testing this gave different results for the final error metrics. This is particularly the case for Q2 which is calculated using RSS from ncomp-1, but after testing also appeared to be the case for other error metrics although the source of the component dependency is not clear (could also be due to seed setting within loop, again something I played around with but couldn't fix in a non-loop manner). As the function is quite intricate I decided to leave the loops as is despite inflated runtime and simply subset results to keep only error metrics relating to the extract ncomp.

Unit tests: were created to ensure that the new perf.assess() functions give exactly the same result as perf() for the same data, context and ncomp. Added additional testing for different modes for pls and spls, and checked also feature stability ouputs for perf.spls() and perf.assess.spls().

Plotting: plotting of perf.assess() is similar to perf() for PLS objects even when nrep = 1.
nrep = 10
Screenshot 2024-11-15 at 3 08 38 pm
nrep = 1
Screenshot 2024-11-15 at 3 09 44 pm
valdation = loo
Screenshot 2024-11-15 at 3 10 17 pm

@evaham1
Copy link
Collaborator Author

evaham1 commented Nov 18, 2024

perf.assess.sgccda()

Have created perf.assess.sgccda() built on perf.sgccda(). Ideally to improve runtime I would have removed calculations over multiple components (in this function this is done using lots of lapplys) but due to complexity and possible inter-dependency between components maintained all the code in the function and just added lines at the end to filter to retain only information for the component used in the input model whilst retaining the result output data structure.

Unit tests: were created to ensure that the new perf.assess.sgccda() give exactly the same result as perf() for the same data, context and ncomp.

Plotting: plotting for block-plsda and block-splda objects run with perf.assess() only work if nrep>1 and validation = 'Mfold'
Screenshot 2024-11-18 at 4 05 12 pm

@evaham1
Copy link
Collaborator Author

evaham1 commented Nov 18, 2024

perf.assess.mint.plsda() and perf.assess.mint.splsda()

Have created perf.assess.mint(s)plsda() built on perf.mint.plsda(). Fixed for loops so only calculates metrics for the one component corresponding to ncomp, although for auc extra data slots are still generated.

Unit tests: were created to ensure that the new perf.assess.mint.plsda() give exactly the same result as perf() for the same data, context and ncomp.

Plotting: plotting does not work for perf.assess.mint.plsda() likely due to lack of repeats in LOO CV method consistent with which configurations allow for plotting in the other perf.assess functions

@evaham1 evaham1 changed the title 343 streamline perf and tune 343 streamline perf Nov 18, 2024
@evaham1 evaham1 self-assigned this Nov 19, 2024
@evaham1
Copy link
Collaborator Author

evaham1 commented Nov 19, 2024

After discussing with KA makes sense to remove any plotting functionality for these objects as the plots are not informative if they are not comparing anything. Instead simply keep the performance metrics for the model in question which can be used just as a simple readout for performance of the final model.

  • perf.assess.plsda()
  • perf.assess.pls()
  • perf.assess.sgccda()
  • perf.assess.mint.plsda()

Also did a couple more checks to make sure the PA metrics that are outputted for pls.assess() are identical to those outputted by pls().

@evaham1 evaham1 merged commit 06d7d92 into master Nov 19, 2024
10 of 11 checks passed
@evaham1 evaham1 deleted the 343-streamline-perf-and-tune branch November 19, 2024 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactoring: streamline perf() and tune() functions to do performance assessment on just input model
1 participant