You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A comprehensive evaluation framework for Large Language Models (LLMs), providing extensive assessments across three key dimensions: general capabilities, safety, and robustness. The framework includes diverse benchmarks and supports both API-based and local models with distributed evaluation capabilities.