Web 自动化 Agent 完整教程,让 Agent 操作浏览器
Web 自动化 Agent 能做什么?
- 数据采集:自动爬文章、商品、榜单、新闻
- 表单填写:自动填报名表、问卷、注册信息
- 自动化测试:模拟用户点按钮、填表单、测功能
- 网页监控:定时查价格、查库存、查公告变化
- 在线交互:自动预约、自动下单、自动签到
浏览器控制神器:Playwright
- 速度:更快,原生异步
- 稳定性:自动等元素加载完,不报错
- API:简洁好写,链式调用
- 自带:截屏、录屏、多浏览器支持
pip install playwright
playwright install chromium
Playwright 基础操作(可直接运行)
Playwright 基础演示:打开网页 + 截图 + 读标题
from playwright.async_api import async_playwright
import asyncio
async def demo():
async with async_playwright() as p:
# 启动浏览器
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
# 进入网页
await page.goto("https://example.com")
# 截图
await page.screenshot(path="example.png")
# 读标题
title = await page.title()
print("网页标题:", title)
await browser.close()
asyncio.run(demo())
Web Agent 核心架构:浏览器控制器
from playwright.async_api import async_playwright
class BrowserController:
def __init__(self, headless=True):
self.headless = headless
self.browser = None
self.context = None
self.page = None
async def start(self):
# 启动浏览器
self.playwright = await async_playwright()
self.browser = await self.playwright.chromium.launch(headless=self.headless)
self.context = await self.browser.new_context()
self.page = await self.context.new_page()
async def navigate(self, url):
await self.page.goto(url, wait_until="networkidle")
return await self.page.title()
async def click(self, selector):
await self.page.click(selector)
await self.page.wait_for_load_state("networkidle")
async def fill(self, selector, text):
await self.page.fill(selector, text)
async def get_text(self, selector):
return await self.page.text_content(selector)
async def extract_list(self, selector):
# 提取列表数据
elements = await self.page.query_selector_all(selector)
return [await e.inner_text() for e in elements]
async def screenshot(self, path):
await self.page.screenshot(path=path)
async def close(self):
await self.browser.close()
智能元素定位:让 AI 看得懂网页
class ElementFinder:
def __init__(self, page):
self.page = page
async def find_by_text(self, text):
return self.page.get_by_text(text)
async def find_button(self, name):
return self.page.get_by_role("button", name=name)
async def find_input(self, label):
return self.page.get_by_role("textbox", name=label)
async def find_link(self, text):
return self.page.get_by_role("link", name=text)
async def page_info(self):
# 返回页面标题+链接,给AI做判断
return {
"title": await self.page.title(),
"url": self.page.url
}
操作规划器:AI 大脑决定下一步做什么
from langchain_openai import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
import json
llm = ChatOpenAI(model="gpt-4", temperature=0)
planner = ChatPromptTemplate.from_messages([
("system", """你是浏览器自动化大师。根据任务输出下一步动作:
支持的操作:
- navigate(url)
- click(selector)
- fill(selector, text)
- wait(秒)
- get_text(selector)
- screenshot()
- done()
返回严格JSON格式:
{"action":"操作名","params":["参数"],"reason":"原因"}
"""),
("human", "任务:{task}\n当前页面:{page}\n历史:{history}")
]) | llm
实战:数据采集 Agent(修复可直接运行)
from playwright.async_api import async_playwright
import asyncio
from datetime import datetime
async def scrape_zhihu():
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
await page.goto("https://www.zhihu.com")
await page.wait_for_load_state("networkidle")
articles = []
items = await page.query_selector_all("div.ContentItem")
for item in items[:10]:
title_el = await item.query_selector("a.ContentItem-title")
author_el = await item.query_selector("a.UserLink-link")
if not title_el:
continue
title = await title_el.inner_text()
url = await title_el.get_attribute("href")
author = await author_el.inner_text() if author_el else "匿名"
articles.append({
"标题": title,
"作者": author,
"链接": url,
"时间": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
})
print("抓取完成:")
for i, art in enumerate(articles, 1):
print(f"{i}. {art['标题']} | {art['作者']}")
await browser.close()
return articles
asyncio.run(scrape_zhihu())
最佳实践(避坑必备)
- 不要用 time.sleep ()Playwright 会自动等待,手动等反而容易崩。
- 必须加异常处理网页变了、网断了,都要能继续跑。
- 控制速度爬太快会被封 IP。
- 验证码要人工介入遇到验证就让程序暂停,提示用户手动过一下。
总结
- Playwright = 浏览器自动化最强工具
- BrowserController = Agent 的手脚
- ElementFinder = 让 AI 看懂网页
- 操作规划器 = AI 大脑决定点哪里
- 最终效果:AI 全自动操作浏览器,完成采集、填表、监控、交互等任务
来自:https://mp.weixin.qq.com/s/xumoNtTuWPw2EmDinRr0vA
本文内容仅供个人学习、研究或参考使用,不构成任何形式的决策建议、专业指导或法律依据。未经授权,禁止任何单位或个人以商业售卖、虚假宣传、侵权传播等非学习研究目的使用本文内容。如需分享或转载,请保留原文来源信息,不得篡改、删减内容或侵犯相关权益。感谢您的理解与支持!