-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bias and model transparency #108
Comments
I recently read a paper Model Cards for Model Reporting that discusses a "model card" proposal that suggests an approach where an ML model is accompanied by documentation detailing the model's limitations and performance characteristics. Perhaps this model [pun unintended] would work in the context of the web-based ML as well. In fact, it would be quite a natural fit in my view. Here are couple of proof of concept model cards from Google's Cloud Vision API: face detection and object detection. Edit: More examples of model cards from another project (https://mediapipe.dev/) at https://google.github.io/mediapipe/solutions/models Adding @mmitchellai, one of the paper's authors, for comments. |
Looking at the face detection example linked by @anssiko , it helps identifies for instance that the sample of data is very strongly skewed towards "lighter tone skins" (100K samples) where "darker tone skins" were only a 1/5th of the sample. In any case, exposing the data sounds a really useful first step; it would be great to hear about other similar projects and if there is any convergence in this space that could be triggered by wider dev adoption (which we can assume bringing ML to the Web should induce). |
Knowing the limitations and potential bias of the training data is a great step in the right direction. Also knowing what proxy data was used to fill gaps would be helpful. Some of the groups looking at this include: Jutta |
Hi, This is an interesting topic; model transparency and explainability are not easy. Is there any thought on how one can challenge the output of the ML model? I get the impression that Model cards tend to make the whole model transparent. Can it explain the decision made for a single data point (for instance, she/he wants to know why this decision has been made for her/his own case)? Thanks, |
In her talk, Jutta highlighted the risks for minorities or groups whose data are underrepresented in data used for training models, and approaches to reduce the bias (e.g. the "lawnmower" approach)
@JohnRochfordUMMS 's talk highlighted that privacy concerns make that phenomenon even stronger for people with disabilities, and highlighted tools that can help identify bias in training data.
Are there well known metrics or metadata that a model provider can (and ideally, should) attach to their models to help developers assess how much and what kind of bias they might be importing when they use a given model? Are there natural fora where discussions on these metadata are expected to happen?
The text was updated successfully, but these errors were encountered: