Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharing NaturalBench! #1

Open
linzhiqiu opened this issue Dec 8, 2024 · 2 comments
Open

Sharing NaturalBench! #1

linzhiqiu opened this issue Dec 8, 2024 · 2 comments

Comments

@linzhiqiu
Copy link

I am Zhiqiu Lin, a final-year PhD student at Carnegie Mellon University working with Prof. Deva Ramanan. Your recent work is very inspiring and insightful for us!

I wanted to share NaturalBench (NeurIPS'24 D&B), a collaborative project between CMU and the University of Washington, which might interest you:

NaturalBench (https://linzhiqiu.github.io/papers/naturalbench/) is a vision-centric benchmark that challenges vision-language models with pairs of simple questions about natural imagery. Unlike prior VQA benchmarks (like MME and ScienceQA), which blind language models (e.g., GPT-3.5) can solve, NaturalBench ensures such shortcuts won’t work. We evaluated 53 state-of-the-art models, and even top models like GPT-4o and Qwen2-VL fall 50%-70% short of human accuracy (90%+), revealing significant room for improvement.

We also found that current models show strong answer biases, such as favoring “Yes” over “No” regardless of the input. Correcting these biases can boost performance by 2-3x, even for GPT-4o, making NaturalBench a valuable testbed for future debiasing techniques.

Check out my Twitter post about it here: https://x.com/ZhiqiuLin/status/1848454555341885808.

🚀 Start using NaturalBench: https://github.com/Baiqi-Li/NaturalBench

Best,
Zhiqiu

@kaiwang960112
Copy link

Hi Zhiqiu, thx so much for your message! Wangbo is attending NeurIPS in Vancouver. Maybe you guys can talk more at the conference. :)

@wangbo-zhao
Copy link
Contributor

Thank you for your interest. I have been in Vancouver. Hope to talk with you about this in NeurIPS 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants