evals
Here are 36 public repositories matching this topic...
AI Observability & Evaluation
-
Updated
Apr 4, 2025 - Jupyter Notebook
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including OpenAI Agents SDK, CrewAI, Langchain, Autogen, AG2, and CamelAI
-
Updated
Apr 5, 2025 - Python
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
-
Updated
Apr 4, 2025 - Python
Laminar - open-source all-in-one platform for engineering AI products. Crate data flywheel for you AI app. Traces, Evals, Datasets, Labels. YC S24.
-
Updated
Apr 4, 2025 - TypeScript
🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite
-
Updated
Apr 3, 2025 - Python
Test your LLM-powered apps with TypeScript. No API key required.
-
Updated
Apr 4, 2025 - TypeScript
[NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding
-
Updated
Mar 7, 2025 - Jupyter Notebook
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
-
Updated
Apr 4, 2025 - TypeScript
Evalica, your favourite evaluation toolkit
-
Updated
Apr 2, 2025 - Python
Benchmarking Large Language Models for FHIR
-
Updated
Nov 29, 2024
An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"
-
Updated
Feb 27, 2025 - Python
Root Signals Python SDK
-
Updated
Mar 31, 2025 - Python
MCP for Root Signals Evaluation Platform
-
Updated
Apr 3, 2025 - Python
Improve this page
Add a description, image, and links to the evals topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the evals topic, visit your repo's landing page and select "manage topics."