💻 Developer Nexus: Evaluating
GitHub
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
⭐ 39413 | 🍴 4778GitHub
mlflow/mlflow
The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
⭐ 24424 | 🍴 5324GitHub
google/adk-python
An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
⭐ 17980 | 🍴 2975GitHub
openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
⭐ 17919 | 🍴 2901