langfuse-core-workflow-b

Featured

Execute Langfuse secondary workflow: Evaluation, scoring, and datasets. Use when implementing LLM evaluation, adding user feedback, or setting up automated quality scoring and experiment datasets. Trigger with phrases like "langfuse evaluation", "langfuse scoring", "rate llm outputs", "langfuse feedback", "langfuse datasets", "langfuse experiments".

AI & Automation 2,359 stars 334 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Langfuse Core Workflow B: Evaluation, Scoring & Datasets ## Overview Implement LLM output evaluation using Langfuse scores (numeric, categorical, boolean), the experiment runner SDK for dataset-driven benchmarks, prompt management with versioned prompts, and LLM-as-a-Judge evaluation patterns. ## Prerequisites - Langfuse SDK configured with API keys - Traces already being collected (see `langfuse-core-workflow-a`) - For v4+: `@langfuse/client` installed ## Instructions ### Step 1: Score Traces via SDK Langfuse supports three score data types: **Numeric**, **Categorical**, and **Boolean**. ```typescript import { LangfuseClient } from "@langfuse/client"; const langfuse = new LangfuseClient(); // Numeric score (e.g., 0-1 quality rating) await langfuse.score.create({ traceId: "trace-abc-123", name: "relevance", value: 0.92, dataType: "NUMERIC", comment: "Highly relevant answer with good context usage", }); // Categorical score (e.g., pass/fail classification) await langfuse.score.create({ traceId: "trace-abc-123", observationId: "gen-xyz-456", // Optional: score a specific generation name: "quality-tier", value: "excellent", dataType: "CATEGORICAL", }); // Boolean score (e.g., thumbs up/down) await langfuse.score.create({ traceId: "trace-abc-123", name: "user-approved", value: 1, // 1 = true, 0 = false dataType: "BOOLEAN", comment: "User clicked thumbs up", }); ``` ### Step 2: User Feedback Collection ```typescript // API endpoint for f...

Details

Author: jeremylongshore
Repository: jeremylongshore/claude-code-plugins-plus-skills
Created: 8 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

langfuse

Expert in Langfuse - the open-source LLM observability platform. Covers tracing, prompt management, evaluation, datasets, and integration with LangChain, LlamaIndex, and OpenAI. Essential for debugging, monitoring, and improving LLM applications in production. Use when: langfuse, llm observability, llm tracing, prompt management, llm evaluation.

353 Updated today

aiskillstore

AI & Automation Solid

langfuse

27,984 Updated today

davila7

AI & Automation Featured

langfuse

40,440 Updated today

sickn33