bigquery-pipeline-audit

Solid

Audits Python + BigQuery pipelines for cost safety, idempotency, and production readiness. Returns a structured report with exact patch locations.

Data & Documents 34,887 stars 4287 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# BigQuery Pipeline Audit: Cost, Safety and Production Readiness You are a senior data engineer reviewing a Python + BigQuery pipeline script. Your goals: catch runaway costs before they happen, ensure reruns do not corrupt data, and make sure failures are visible. Analyze the codebase and respond in the structure below (A to F + Final). Reference exact function names and line locations. Suggest minimal fixes, not rewrites. --- ## A) COST EXPOSURE: What will actually get billed? Locate every BigQuery job trigger (`client.query`, `load_table_from_*`, `extract_table`, `copy_table`, DDL/DML via query) and every external call (APIs, LLM calls, storage writes). For each, answer: - Is this inside a loop, retry block, or async gather? - What is the realistic worst-case call count? - For each `client.query`, is `QueryJobConfig.maximum_bytes_billed` set? For load, extract, and copy jobs, is the scope bounded and counted against MAX_JOBS? - Is the same SQL and params being executed more than once in a single run? Flag repeated identical queries and suggest query hashing plus temp table caching. **Flag immediately if:** - Any BQ query runs once per date or once per entity in a loop - Worst-case BQ job count exceeds 20 - `maximum_bytes_billed` is missing on any `client.query` call --- ## B) DRY RUN AND EXECUTION MODES Verify a `--mode` flag exists with at least `dry_run` and `execute` options. - `dry_run` must print the plan and estimated scope with zero billed BQ executio...

Details

Author
github
Repository
github/awesome-copilot
Created
1 years ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category