💻 Developer Nexus: Evaluation
GitHub
lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
⭐ 39414 | 🍴 4779GitHub
mlflow/mlflow
The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
⭐ 24360 | 🍴 5310GitHub
langfuse/langfuse
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
⭐ 22151 | 🍴 2201GitHub
google/adk-python
An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.
⭐ 17910 | 🍴 2955StackOverflow
Evaluation order of lambda capture initializers
Answers: 1StackOverflow
How to retrieve Stockfish evaluation score with NNUE by itself using the Stockfish CLI
Answers: 2StackOverflow
How to implement loglikelihood() for an MLX-based lm-evaluation-harness using mlx_lm?
Answers: 1StackOverflow
