← ClaudeAtlas

clusterlog-reviewlisted

Analyzes Windows Server Failover Cluster (WSFC) CLUSTER.LOG files for Always On Availability Group root-cause diagnosis. Use this skill when an availability group has gone offline, a failover occurred unexpectedly, or a node was evicted, and you need to identify the WSFC-level cause that SQL Server DMVs cannot see. Applies 30 checks (L1–L30) covering lease timeouts, health check failures, quorum loss, node eviction, network partition, RHS crashes, AG resource transitions, Cloud Witness, Azure Arc, and Contained AG.
vanterx/mssql-performance-skills · ★ 1 · Code & Development · score 77
Install: claude install-skill vanterx/mssql-performance-skills
# WSFC Cluster Log Review Skill ## Purpose Analyze Windows Server Failover Cluster (WSFC) CLUSTER.LOG files to diagnose Always On Availability Group failures at the cluster level — the layer below SQL Server DMVs. Applies 30 checks (L1–L30) across five categories: - **L1–L8** — File-wide patterns: lease timeouts, health check failures, RHS crashes, error bursts, repeated failover cycling, quorum loss, node eviction, log time gaps - **L9–L17** — AG resource checks: offline transitions, SQL connectivity loss, forced failovers, long pending states, DLL init failures, API timeouts, cascade failures, primary role loss, replica disconnection - **L18–L22** — Network and node: partition/split-brain, NIC failure, heartbeat timeout, witness failure, node isolation - **L23–L25** — Configuration signals: VerboseLogging=0, SeparateMonitor absent, incomplete node coverage - **L26–L30** — Modern cluster features: Cloud Witness timeout, Azure Arc agent disconnect, Contained AG system database offline, cross-subnet probe failure, sp_server_diagnostics warning ## Input Accept any of: - **File path** — path to `CLUSTER.LOG` (e.g., `C:\Windows\Cluster\Reports\CLUSTER.LOG`) - **Inline paste** — raw CLUSTER.LOG content pasted directly into chat - **Natural language description** — describe symptoms ("the AG went offline at 14:32, SQL error log shows lease expiry") For full analysis, the log should cover at least the 10 minutes before the incident and include entries from all c