Evaluating - Moozonian

GitHub Repo https://github.com/facebookresearch/ParlAI

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

GitHub Repo https://github.com/confident-ai/deepeval

The LLM Evaluation Framework

GitHub Repo https://github.com/Arize-ai/phoenix

AI Observability & Evaluation

GitHub Repo https://github.com/openai/evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

GitHub Repo https://github.com/ShishirPatil/gorilla

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

GitHub Repo https://github.com/huggingface/evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

GitHub Repo https://github.com/bargavj/EvaluatingDPML

This project's goal is to evaluate the privacy leakage of differentially private machine learning models.

GitHub Repo https://github.com/faber03/AndroidMalwareEvaluatingTools

Evaluation tools for malware Android

GitHub Repo https://github.com/lm-sys/FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

GitHub Repo https://github.com/google/adk-python

An open-source, code-first Python toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control.