data-acquisition-browserlisted
Install: claude install-skill Pranjay-kumar/universal-data-acquisition-pipeline-skill
# Data Acquisition Browser
Act as the browser acquisition specialist. Use Patchright for warm-session capture when a normal browser must mint cookies or storage state before API endpoints are visible. Use Playwright for ordinary rendered-DOM fallback when structured HTTP probes are insufficient and no warm browser context is needed.
When the user asks for page loads only or "no API", use Patchright as a renderer and extract DOM/JSON-LD/meta/visible rows only. Do not harvest, replay, or recommend structured endpoints in that mode.
## Shared Core
Read from `../data-acquisition-core/references/`:
- `source-access.md`
- `playwright-rendered-dom.md`
- `probing.md`
- `compliance-boundaries.md`
- `output-contracts.md`
## Helpers
Use `scripts/patchright_cookie_endpoint_probe.mjs` for warm-session cookie/storage generation and endpoint discovery. Use `scripts/patchright_page_dom_probe.mjs` for Patchright page-load-only DOM extraction. Use `scripts/playwright_probe.mjs` only for ordinary public rendered-DOM fallback.
From the repo root:
```powershell
npm install
npx patchright install chrome
npm run probe:patchright -- "https://example.com/public-category" "outputs/example-patchright-endpoints.json"
```
The Patchright helper opens a persistent Chrome context, lets the page create ordinary browser-issued cookies/storage state, records JSON/API/XHR-looking requests and responses, saves local storage state under `auth/`, and writes a redacted endpoint report.
Page-load-only mod