Evaluation - Moozonian Search

GitHub

lm-sys/FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

⭐ 39414 | 🍴 4779

GitHub

The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

⭐ 24360 | 🍴 5310

GitHub

langfuse/langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

⭐ 22151 | 🍴 2201

GitHub

google/adk-python

An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.

⭐ 17910 | 🍴 2955

StackOverflow

💻 Developer Nexus: Evaluation

lm-sys/FastChat

mlflow/mlflow

langfuse/langfuse

google/adk-python

Evaluation order of lambda capture initializers

How to retrieve Stockfish evaluation score with NNUE by itself using the Stockfish CLI

How to implement loglikelihood() for an MLX-based lm-evaluation-harness using mlx_lm?

Null-aware Evaluation flawed in Polars 1.22.0?