download-webpage-as-pdf

Solid

Save a live webpage as a high-fidelity PDF that preserves the original layout AND every image (including lazy-loaded ones) using the agent-browser CLI. Use this whenever the user asks to "download this page as PDF", "save this article", "archive this URL", "fetch this page for reference", or otherwise wants a local PDF of a web page that looks like the browser version. Especially important on modern JS-heavy sites (engineering blogs, Next.js sites, anything with IntersectionObserver lazy loading) where naive `chrome --headless --print-to-pdf` or a bare `agent-browser pdf` produces blank rectangles or broken-image placeholders. Trigger this skill even when the user does not name the tool - any request to capture a webpage's full visual content as a PDF on disk should pull this in. For reader-mode/article-only output (no nav, no footer, no manual trimming) prefer percollate instead - see "When NOT to use this".

Web & Frontend 31 stars 2 forks Updated 3 days ago MIT

Install

View on GitHub

Quality Score: 85/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Download a webpage as a PDF (agent-browser recipe) The naive approaches fail on modern sites: - `chrome --headless --print-to-pdf` captures only the initial viewport's images. Anything below the fold renders as a blank rectangle. - `agent-browser pdf` immediately after `open` has the same problem - lazy-loaded images haven't decoded yet. - Scrolling via JS and then waiting a fixed time is also unreliable - you don't know when each image actually finished. The fix is one async script that strips lazy-load attributes, scrolls the page to trigger any IntersectionObserver-based loaders, and `await`s every `<img>` to decode. agent-browser's `eval` waits for the returned promise to resolve before exiting, so the subsequent `pdf` command sees a fully-loaded DOM. ## The recipe If multiple test/agent runs may share the host's agent-browser, isolate each invocation with `agent-browser --session <unique-name> ...` on every command in the pipeline. Single-user one-off captures can omit the flag and use the default session. Set `AGENT_BROWSER_HEADED=false` in the environment before running so the skill launches headless even when the host's `~/.agent-browser/config.json` defaults to `"headed": true`. This avoids popping a real Chrome window on the user's desktop while an agent is working in the background. Do NOT use the CLI's `--headed false` flag - in agent-browser 0.26.0 it parses but corrupts the session context (subsequent commands see an empty document). The env var is the s...

Details

Author: tenequm
Repository: tenequm/skills
Created: 8 months ago
Last Updated: 3 days ago
Language: Python
License: MIT

Integrates with

Next.js · Frontend

Similar Skills

Semantically similar based on skill content — not just same category

Web & Frontend Listed

web-browsing

Navigate, interact with, and read live web pages from the terminal by combining the agent-browser CLI (headless Chrome over CDP) with the defuddle CLI (clean Markdown extraction). Use whenever a task needs a real browser — opening a page, filling a form, clicking through a flow, logging in, taking a screenshot, testing a web app, scraping data — or whenever a URL needs to be read as clean Markdown instead of raw HTML, including JS-rendered SPAs and pages behind a login that a plain HTTP fetch cannot see. Triggers on "open a website", "read this page", "fetch this URL", "what does this page say", "fill out a form", "take a screenshot", "scrape this page", "log in to", "test this web app", 瀏覽器自動化, 開網頁, 讀這個網頁, 網頁截圖, 填表單, 抓網頁內容. Prefer this over WebFetch and over any other browser automation approach.

0 Updated today

chenwei791129

Web & Frontend Featured

agent-browser

Agent-browser usage guide. Read this before running any agent-browser commands. Covers the snapshot-and-ref workflow, navigating pages, interacting with elements (click, fill, type, select), extracting text and data, taking screenshots, managing tabs, handling forms and auth, waiting for content, running multiple browser sessions in parallel, and troubleshooting common failures. Use when the user asks to interact with a website, fill a form, click something, extract data, take a screenshot, log into a site, test a web app, or automate any browser task.

822 Updated 2 days ago

fcakyon

AI & Automation Listed

agent-browser

Automate a real browser with the agent-browser CLI — open pages, snapshot interactive elements, click/fill by ref, extract content. Use when the user wants to browse, open a page, visit a URL, navigate a site, click or fill a form element, scrape, take a screenshot of a page, test a web app, or says "agent-browser". Works as a general-purpose browser automation tool in any repo, even ones without a local browser skill installed.

2 Updated 5 days ago

lttr