`loss_accuracy` returns 0 for `mean_dropout_loss` #535

JeffreyRStevens · 2022-12-29T22:23:51Z

I would like to use loss_accuracy as my loss function in model_parts(), but whenever I use it, the mean_drop_loss is always 0. I have tried loss_accuracy for regression, classification, and multiclass classification (see reprex below). Am I using it correctly?

library(DALEX)
#> Welcome to DALEX (version: 2.4.2).
#> Find examples and detailed introduction at: http://ema.drwhy.ai/
library(ranger)
df <- mtcars[, c('mpg', 'cyl', 'disp', 'hp', 'vs')]
# Regression
reg <- lm(mpg ~ ., data = df)
explainer_reg <- explain(reg, data = df[,-1], y = df[,1])
#> Preparation of a new explainer is initiated
#>   -> model label       :  lm  (  default  )
#>   -> data              :  32  rows  4  cols 
#>   -> target variable   :  32  values 
#>   -> predict function  :  yhat.lm  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package stats , ver. 4.2.2 , task regression (  default  ) 
#>   -> predicted values  :  numerical, min =  12.56206 , mean =  20.09062 , max =  27.04625  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -4.019038 , mean =  1.010303e-14 , max =  6.976988  
#>   A new explainer has been created!
feature_importance(explainer_reg, loss_function = loss_accuracy)
#>       variable mean_dropout_loss label
#> 1 _full_model_                 0    lm
#> 2          cyl                 0    lm
#> 3         disp                 0    lm
#> 4           hp                 0    lm
#> 5           vs                 0    lm
#> 6   _baseline_                 0    lm
# Classification
classif <- glm(vs ~ ., data = df, family = binomial)
explainer_classif <- explain(classif, data = df[,-5], y = df[,5])
#> Preparation of a new explainer is initiated
#>   -> model label       :  lm  (  default  )
#>   -> data              :  32  rows  4  cols 
#>   -> target variable   :  32  values 
#>   -> predict function  :  yhat.glm  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package stats , ver. 4.2.2 , task classification (  default  ) 
#>   -> predicted values  :  numerical, min =  7.696047e-06 , mean =  0.4375 , max =  0.9920295  
#>   -> residual function :  difference between y and yhat (  default  )
#>   -> residuals         :  numerical, min =  -0.9474062 , mean =  -1.483608e-12 , max =  0.5318376  
#>   A new explainer has been created!
feature_importance(explainer_classif, loss_function = loss_accuracy)
#>       variable mean_dropout_loss label
#> 1 _full_model_                 0    lm
#> 2          mpg                 0    lm
#> 3          cyl                 0    lm
#> 4         disp                 0    lm
#> 5           hp                 0    lm
#> 6   _baseline_                 0    lm
# Multiclass classification
multiclass <- ranger(cyl ~ ., data = df, probability = TRUE)
explainer_multiclass <- explain(multiclass, data = df[,-2], y = df[,2])
#> Preparation of a new explainer is initiated
#>   -> model label       :  ranger  (  default  )
#>   -> data              :  32  rows  4  cols 
#>   -> target variable   :  32  values 
#>   -> predict function  :  yhat.ranger  will be used (  default  )
#>   -> predicted values  :  No value for predict function target column. (  default  )
#>   -> model_info        :  package ranger , ver. 0.14.1 , task multiclass (  default  ) 
#>   -> model_info        :  Model info detected multiclass task but 'y' is a numeric .  (  WARNING  )
#>   -> model_info        :  By deafult multiclass tasks supports only factor 'y' parameter. 
#>   -> model_info        :  Consider changing to a factor vector with true class names.
#>   -> model_info        :  Otherwise I will not be able to calculate residuals or loss function.
#>   -> predicted values  :  predict function returns multiple columns:  3  (  default  ) 
#>   -> residual function :  difference between 1 and probability of true class (  default  )
#>   -> residuals         :  the residual_function returns an error when executed (  WARNING  ) 
#>   A new explainer has been created!
feature_importance(explainer_multiclass, loss_function = loss_accuracy)
#>       variable mean_dropout_loss  label
#> 1 _full_model_                 0 ranger
#> 2          mpg                 0 ranger
#> 3         disp                 0 ranger
#> 4           hp                 0 ranger
#> 5           vs                 0 ranger
#> 6   _baseline_                 0 ranger

^{Created on 2022-12-29 with reprex v2.0.2}

When I try other loss functions (e.g., loss_root_mean_square for regression, loss_one_minus_auc for classification), they return non-zero values.

feature_importance(explainer_reg, loss_function = loss_root_mean_square)
#>       variable mean_dropout_loss label
#> 1 _full_model_          2.844520    lm
#> 2           vs          2.861546    lm
#> 3           hp          3.328176    lm
#> 4         disp          4.201312    lm
#> 5          cyl          4.498485    lm
#> 6   _baseline_          7.777811    lm
feature_importance(explainer_classif, loss_function = loss_one_minus_auc)
#>       variable mean_dropout_loss label
#> 1 _full_model_        0.03571429    lm
#> 2          mpg        0.04603175    lm
#> 3         disp        0.04642857    lm
#> 4           hp        0.31785714    lm
#> 5          cyl        0.36031746    lm
#> 6   _baseline_        0.51884921    lm

^{Created on 2022-12-29 with reprex v2.0.2}

Is there something different about how loss_accuracy is used?

I'm using DALEX v2.4.2, R v4.2.2, RStudio v2022.12.0+353, Ubuntu 22.04.1

The text was updated successfully, but these errors were encountered:

hbaniecki · 2022-12-29T22:32:42Z

Hi, I think loss_accuracy was never used and it seems to be invalid. Or at least for it to work you would have to change predict_function to return a class i.e. 0/1 instead of probabilities, see

DALEX/R/misc_loss_functions.R

Lines 47 to 51 in b855207

    
           #' @rdname loss_functions 
        
           #' @export 
        
           loss_accuracy <-  function(observed, predicted, na.rm = TRUE) 
        
             mean(observed == predicted, na.rm = na.rm) 
        
           attr(loss_accuracy, "loss_name") <- "Accuracy"

I think that loss_accuracy could use model_performance_accuracy

DALEX/R/model_performance.R

Lines 94 to 97 in b855207

    
           tp = sum((observed == 1) * (predicted >= cutoff)) 
        
           fp = sum((observed == 0) * (predicted >= cutoff)) 
        
           tn = sum((observed == 0) * (predicted < cutoff)) 
        
           fn = sum((observed == 1) * (predicted < cutoff))

DALEX/R/model_performance.R

Lines 166 to 168 in b855207

    
           model_performance_accuracy <- function(tp, fp, tn, fn) { 
        
             (tp + tn)/(tp + fp + tn + fn) 
        
           }

and also it probably should be a decreasing measure 1 - Accuracy as with 1 - AUC.

JeffreyRStevens · 2022-12-29T22:40:38Z

Thanks! So putting that all together, something like this?

loss_one_minus_accuracy <- function(observed, predicted, na.rm = TRUE, cutoff = 0.5) {
  tp = sum((observed == 1) * (predicted >= cutoff)) 
  fp = sum((observed == 0) * (predicted >= cutoff)) 
  tn = sum((observed == 0) * (predicted < cutoff)) 
  fn = sum((observed == 1) * (predicted < cutoff)) 
  1 - (tp + tn)/(tp + fp + tn + fn) 
}

hbaniecki · 2022-12-30T12:04:06Z

@JeffreyRStevens yes, would you like to make a PR?

JeffreyRStevens · 2022-12-30T14:07:37Z

I would be happy to. Would you like me to do anything with loss_accuracy() or just add loss_one_minus_accuracy()?

hbaniecki · 2022-12-30T14:10:33Z

perhaps also remove loss_accuracy() since it's wrong @pbiecek?

pbiecek · 2022-12-30T18:09:23Z

@hbaniecki what's wrong with loss_accuracy?
it shall work for classification models that return classes and it is supposed to be compatible with yardstick approach to validate models with scores and with classes

pbiecek · 2022-12-30T18:15:47Z

currently loss_accuracy does not assume that predicted is a number,
so if you are going to add loss_one_minus_accuracy then it shall has consistent contract

suggested approach:

use a different name (to avoid conflicts with loss_accuracy)
be precise in the documentation

pbiecek · 2022-12-30T18:18:06Z

maybe add model_performance_one_minus_accuracy ans use this function?

hbaniecki · 2023-01-04T14:58:57Z

TODO

model_parts(explainer, loss_function = get_loss_yardstick(reverse=TRUE))
model_parts(explainer, loss_function = get_loss_accuracy(cutoff=0.5)) # returns DALEX::loss_one_minus_acc
model_parts(explainer, loss_function = DALEX::loss_one_minus_acc) # baseline cutoff=0.5

* add loss_one_minus_accuracy #535 * fix typo, update doc * warn -> warning * update package version * add more tests * fix checks * fix tests

hbaniecki added R 🐳 Related to R invalid ❕ This doesn't seem right, potential bug labels Dec 29, 2022

hbaniecki added a commit that referenced this issue Jan 8, 2023

add loss_one_minus_accuracy #535

0bb0c86

hbaniecki mentioned this issue Jan 26, 2023

add loss_one_minus_accuracy #535 #536

Merged

pbiecek pushed a commit that referenced this issue Jan 26, 2023

add loss_one_minus_accuracy #535 (#536)

1c1e476

* add loss_one_minus_accuracy #535 * fix typo, update doc * warn -> warning * update package version * add more tests * fix checks * fix tests

hbaniecki closed this as completed Jan 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`loss_accuracy` returns 0 for `mean_dropout_loss` #535

`loss_accuracy` returns 0 for `mean_dropout_loss` #535

JeffreyRStevens commented Dec 29, 2022

hbaniecki commented Dec 29, 2022 •

edited

Loading

JeffreyRStevens commented Dec 29, 2022

hbaniecki commented Dec 30, 2022

JeffreyRStevens commented Dec 30, 2022

hbaniecki commented Dec 30, 2022

pbiecek commented Dec 30, 2022

pbiecek commented Dec 30, 2022 •

edited

Loading

pbiecek commented Dec 30, 2022

hbaniecki commented Jan 4, 2023

loss_accuracy returns 0 for mean_dropout_loss #535

loss_accuracy returns 0 for mean_dropout_loss #535

Comments

JeffreyRStevens commented Dec 29, 2022

hbaniecki commented Dec 29, 2022 • edited Loading

JeffreyRStevens commented Dec 29, 2022

hbaniecki commented Dec 30, 2022

JeffreyRStevens commented Dec 30, 2022

hbaniecki commented Dec 30, 2022

pbiecek commented Dec 30, 2022

pbiecek commented Dec 30, 2022 • edited Loading

pbiecek commented Dec 30, 2022

hbaniecki commented Jan 4, 2023

`loss_accuracy` returns 0 for `mean_dropout_loss` #535

`loss_accuracy` returns 0 for `mean_dropout_loss` #535

hbaniecki commented Dec 29, 2022 •

edited

Loading

pbiecek commented Dec 30, 2022 •

edited

Loading