Moozonian

💻 Developer Nexus: Evaluating

GitHub

lm-sys/FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

⭐ 39413 | 🍴 4778
GitHub

mlflow/mlflow

The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

⭐ 24424 | 🍴 5324
GitHub

google/adk-python

An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

⭐ 17980 | 🍴 2975
GitHub

openai/evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

⭐ 17919 | 🍴 2901