langfuse-core-workflow-b

Featured

Execute Langfuse secondary workflow: Evaluation, scoring, and datasets. Use when implementing LLM evaluation, adding user feedback, or setting up automated quality scoring and experiment datasets. Trigger with phrases like "langfuse evaluation", "langfuse scoring", "rate llm outputs", "langfuse feedback", "langfuse datasets", "langfuse experiments".

AI & Automation 2,359 stars 334 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Langfuse Core Workflow B: Evaluation, Scoring & Datasets ## Overview Implement LLM output evaluation using Langfuse scores (numeric, categorical, boolean), the experiment runner SDK for dataset-driven benchmarks, prompt management with versioned prompts, and LLM-as-a-Judge evaluation patterns. ## Prerequisites - Langfuse SDK configured with API keys - Traces already being collected (see `langfuse-core-workflow-a`) - For v4+: `@langfuse/client` installed ## Instructions ### Step 1: Score Traces via SDK Langfuse supports three score data types: **Numeric**, **Categorical**, and **Boolean**. ```typescript import { LangfuseClient } from "@langfuse/client"; const langfuse = new LangfuseClient(); // Numeric score (e.g., 0-1 quality rating) await langfuse.score.create({ traceId: "trace-abc-123", name: "relevance", value: 0.92, dataType: "NUMERIC", comment: "Highly relevant answer with good context usage", }); // Categorical score (e.g., pass/fail classification) await langfuse.score.create({ traceId: "trace-abc-123", observationId: "gen-xyz-456", // Optional: score a specific generation name: "quality-tier", value: "excellent", dataType: "CATEGORICAL", }); // Boolean score (e.g., thumbs up/down) await langfuse.score.create({ traceId: "trace-abc-123", name: "user-approved", value: 1, // 1 = true, 0 = false dataType: "BOOLEAN", comment: "User clicked thumbs up", }); ``` ### Step 2: User Feedback Collection ```typescript // API endpoint for f...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
8 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category