agentscope-ai

Organization

OpenJudge: A Unified Framework for Holistic Evaluation and Quality Rewards

2 indexed · 2 Featured · 534 stars · avg score 91

Quality

View on GitHub →

Indexed Skills (2)

AI & Automation Featured

auto-arena

Automatically evaluate and compare multiple AI models or agents without pre-existing test data. Generates test queries from a task description, collects responses from all target endpoints, auto-generates evaluation rubrics, runs pairwise comparisons via a judge model, and produces win-rate rankings with reports and charts. Supports checkpoint resume, incremental endpoint addition, and judge model hot-swap. Use when the user asks to compare, benchmark, or rank multiple models or agents on a custom task, or run an arena-style evaluation.

534 Updated 6 days ago

agentscope-ai

AI & Automation Featured

bib-verify

Verify a BibTeX file for hallucinated or fabricated references by cross-checking every entry against CrossRef, arXiv, and DBLP. Reports each reference as verified, suspect, or not found, with field-level mismatch details (title, authors, year, DOI). Use when the user wants to check a .bib file for fake citations, validate references in a paper, or audit bibliography entries for accuracy.

534 Updated 6 days ago

agentscope-ai

Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.

Categories

Indexed Skills (2)

auto-arena

bib-verify