pdflisted
Install: claude install-skill Fantasia1999/skills-zh
# PDF 处理指南
## 概述
本指南涵盖了使用 Python 库和命令��工具进行核心 PDF 处理操作。有关高级功能、JavaScript 库和详细示例,请参阅 REFERENCE.md。如果您需要填写 PDF 表单,请阅读 FORMS.md 并遵循其说明。
## 快速入门
```python
from pypdf import PdfReader, PdfWriter
# 读取一个 PDF
reader = PdfReader("document.pdf")
print(f"Pages: {len(reader.pages)}")
# 提取文本
text = ""
for page in reader.pages:
text += page.extract_text()
```
## Python 库
### pypdf - 基本操作
#### 合并 PDF
```python
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf", "doc3.pdf"]:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)
with open("merged.pdf", "wb") as output:
writer.write(output)
```
#### 拆分 PDF
```python
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as output:
writer.write(output)
```
#### 提取元数据
```python
reader = PdfReader("document.pdf")
meta = reader.metadata
print(f"Title: {meta.title}")
print(f"Author: {meta.author}")
print(f"Subject: {meta.subject}")
print(f"Creator: {meta.creator}")
```
#### 旋转页面
```python
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90) # 顺时针旋转 90 度
writer.add_page(page)
with open("rotated.pdf", "wb") as output:
writer.write(output)
```
### pdfplumber - 文本和表格提取
#### 提取带布局的文本
```python
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages: