fabric-pyspark-perf-remediatelisted

Diagnose and resolve Apache Spark performance issues in Microsoft Fabric notebooks and Spark Job Definitions. Use when PySpark jobs are slow, notebooks take too long, Spark stages are skewed, shuffles are excessive, out-of-memory errors occur, Delta Lake writes are slow, or Fabric capacity is throttled. Covers data skew, shuffle optimization, broadcast joins, partition tuning, VOrder, Optimized Write, resource profiles, autotune, native execution engine, small file compaction, and Spark UI interpretation. Keywords include slow notebook, OOM, spill, shuffle, skew, broadcast, repartition, coalesce, OPTIMIZE, VACUUM, Z-ORDER, checkpoint, cache, persist, executor memory, driver memory, spark.sql.shuffle.partitions, autoBroadcastJoinThreshold, maxPartitionBytes, Fabric capacity throttling, CU utilization.
PatrickGallucci/fabric-skills · ★ 13 · Data & Documents · score 81

Install: claude install-skill PatrickGallucci/fabric-skills

# Microsoft Fabric PySpark Performance remediate Systematic guide for diagnosing and resolving Apache Spark performance problems in Microsoft Fabric Data Engineering workloads, including notebooks, Spark Job Definitions, and pipeline activities. ## When to Use This Skill Activate when encountering any of these scenarios: - PySpark notebook cells take unexpectedly long to execute - Spark Job Definitions exceed expected duration or fail with timeouts - Out-of-memory (OOM) errors on driver or executors - Excessive shuffle read/write in Spark UI stage details - Data skew causing individual tasks to run much longer than peers - Delta Lake table writes are slow or produce many small files - Fabric capacity utilization is high or jobs are queued/throttled - Need to choose between resource profiles (readHeavy vs writeHeavy) - Deciding whether to enable autotune, native execution engine, or Optimized Write - Interpreting Spark UI metrics (stages, tasks, storage, SQL plan) ## Prerequisites - Access to a Microsoft Fabric workspace with Data Engineering/Science experience - Fabric capacity (F2 or higher) with Spark compute enabled - Familiarity with PySpark DataFrames and Spark SQL - Access to Spark UI via the Monitoring Hub or notebook session details ## Quick Diagnostic Workflow Follow this triage sequence to identify the root cause: 1. **Check capacity status** - Is the Fabric capacity throttled or overloaded? See Monitoring Hub for queued jobs and CU utilization. 2. **Identi