Skip to content

R package

dcevid edited this page May 29, 2020 · 3 revisions

What needs to be implemented?

  • Fix the initial weights feature, with which we can downweight some observations. Those weights should change subsampling probability for each observation.
  • Have the code print the progress bar as it builds the trees
  • Add option to have local centering first, i.e. fitting the marginal means and then taking residuals
  • Handle the naming of variables
    • For specific functionals, name the columns as the columns of Y
    • In predict output, name the rows with rownames of the newdata
    • Think how to handle empty colnames and rownames

Things to check and to fix

  • Use better measure of variable importance
  • Check code parallelization
  • Handling the factor variables
    • aggregate variable importance across different levels
    • think about 10% rule in the context when some levels are quite rare: do we want to split on them?
    • implement wager's dimensionality reduction for the factor variables?
    • for one hot encoding be careful that binary variables have equivalent levels, so we can keep just one, for more than two levels we need to have as many dummies as predictors to keep symmetry of levels
    • be careful that the probability of being chosen in the mtry increases with the number of levels in the current form

Optional stuff

  • Implement other kernels for FourierMMD.
  • Make it modular to enable user to just write some two sample test which will be called in drf
    • make it available to do from R and not C++
    • have option to do initialization
  • Functionals to add
    • prediction regions
    • conditional pairs plot
  • Check whether this fastfood approximation makes sense for the case d >> p. This case might be useful in some applications.
  • missing data support
Clone this wiki locally