Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

Zicheng Zhang¹, Tengchuan Kou¹, Shushi Wang¹, Chunyi Li¹, Wei Sun¹, Wei Wang²

Xiaoyu Li², Zongyu Wang², Xuezhi Cao², Xiongkuo Min¹, Xiaohong Liu¹, Guangtao Zhai¹

¹Shanghai Jiao Tong University, ²Meituan

Paper Available at Arxiv.

Dataset will be available at AGI-Eval.

Q-Eval serves as the dataset for the NTIRE 2025 XGC Track 2.

Motivation

Evaluating text-to-vision content hinges on two crucial aspects: visual quality and alignment. While significant progress has been made in developing objective models to assess these dimensions, the performance of such models heavily relies on the scale and quality of human annotations. According to Scaling Law, increasing the number of human-labeled instances follows a predictable pattern that enhances the performance of evaluation models. Therefore, we introduce a comprehensive dataset designed to Evaluate Visual quality and Alignment Level for text-tovision content (Q-EVAL-100K), featuring the largest collection of human-labeled Mean Opinion Scores (MOS) for the mentioned two aspects. The Q-EVAL-100K dataset encompasses both text-to-image and text-to-video models, with 960K human annotations specifically focused on visual quality and alignment for 100K instances (60K images and 40K videos). Leveraging this dataset with context prompt, we propose Q-Eval-Score, a unified model capable of evaluating both visual quality and alignment with special improvements for handling long-text prompt alignment. Experimental results indicate that the proposed Q-Eval-Score achieves superior performance on both visual quality and alignment, with strong generalization capabilities across other benchmarks. These findings highlight the significant value of the Q-EVAL-100K dataset.

Visual Quality Performance

Text Alignment Performance

Dataset Access

The dataset will be available at AGI-Eval.

Model Release

To be updated ...

Citation

If you find our work useful, please cite our paper as:

@misc{zhang2025qeval100kevaluatingvisualquality,
      title={Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content}, 
      author={Zicheng Zhang and Tengchuan Kou and Shushi Wang and Chunyi Li and Wei Sun and Wei Wang and Xiaoyu Li and Zongyu Wang and Xuezhi Cao and Xiongkuo Min and Xiaohong Liu and Guangtao Zhai},
      year={2025},
      eprint={2503.02357},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.02357}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
overview.png		overview.png
p1.png		p1.png
p2.png		p2.png
spotlight.png		spotlight.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

Motivation

Visual Quality Performance

Text Alignment Performance

Dataset Access

Model Release

Citation

About

Releases

Packages

zzc-1998/Q-Eval

Folders and files

Latest commit

History

Repository files navigation

Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content

Motivation

Visual Quality Performance

Text Alignment Performance

Dataset Access

Model Release

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages