CAPTCHA Bypass Research for OpenClaw Browser Automation
CAPTCHA Bypass Research for OpenClaw Browser Automation
Date: 2026-05-06 Focus: DuckDuckGo CAPTCHA, Playwright-based browser automation (OpenClaw)
The Problem
Daily cron agents (Trend Analyst, VN Legal Eagle, etc.) use OpenClaw's browser tool (Playwright-based) for web research. DuckDuckGo occasionally serves CAPTCHA challenges to automated requests from the same IP, blocking search operations.
DDG's CAPTCHA is relatively simple — it's not Cloudflare Turnstile or reCAPTCHA v3. It triggers on:
- High request volume from a single IP
- Missing or suspicious browser fingerprints
navigator.webdriver = true(headless detection)HeadlessChromein User-Agent string
GitHub Repos & Tools
1. playwright-stealth (Python) — ⭐ Primary recommendation
- Repo: https://github.com/Granitosaurus/playwright-stealth
- PyPI:
playwright-stealthv2.0.2+ - What it does: Patches
navigator.webdriver, User-Agent, missing plugins, WebGL vendor, codec fingerprints, Chrome runtime objects - Status: Actively maintained, modern context-manager API
- Applicability: OpenClaw uses Node.js Playwright, but the evasion concepts are universal. The JS evasion scripts could be injected via
browser act: evaluate
2. playwright-extra + puppeteer-extra-plugin-stealth (Node.js)
- npm:
playwright-extrav4.3.6 - What it does: Plugin system for Playwright that loads stealth evasion modules
- Status: Last published 3 years ago — maintenance is questionable
- Applicability: Could be integrated into OpenClaw's Playwright driver, but requires code changes to the gateway
3. SearXNG — DuckDuckGo CAPTCHA handling
- Issue: https://github.com/searxng/searxng/issues/3927
- Key insight: DDG CAPTCHA is triggered by server-side scraping, not browser fingerprinting alone. SearXNG engineers recommend:
- Using
https://html.duckduckgo.com/html/(JS-disabled endpoint) - Using
https://lite.duckduckgo.com/lite/(ultra-lightweight) - Properly handling the
vqdparameter (DDG's anti-scraping token) - Answering CAPTCHA once from the server IP, then cookies persist
- Using
4. Browserless.io — Cloud Browser API
- URL: https://www.browserless.io/
- What: Managed browser automation with built-in anti-bot bypass
- Cost: Paid service, not self-hosted
- Applicability: Overkill for our use case, but worth knowing about
Techniques Applicable to OpenClaw
✅ Profile Persistence (Most Effective for Our Case)
This is the #1 thing that works for daily cron agents.
OpenClaw already supports persistent browser profiles (bobbie, trend-scout, etc.). When a profile is used:
- Cookies persist across sessions (including DDG's "I'm not a robot" cookie)
- LocalStorage data carries over
- The browser builds a "trust history" with DDG
What to ensure:
- Each agent uses its own profile consistently
- Profiles aren't cleared between runs
- If CAPTCHA appears once and is solved, the cookie persists for future runs
✅ Use DDG Lite/HTML Endpoints
Instead of navigating to duckduckgo.com/?q=..., use:
https://html.duckduckgo.com/html/?q=query— HTML-only, no JS, less fingerprintinghttps://lite.duckduckgo.com/lite/?q=query— ultra-minimal, almost no anti-bot- These endpoints rarely trigger CAPTCHA because they're designed for low-capability clients
For agents that search DDG, prefer web_search tool (uses DDG API directly) over browser navigation to DDG.
✅ Rate Limiting Between Requests
- Add delays between browser navigation to DDG (2-5 seconds minimum)
- Cron agents run once daily — the issue is likely burst requests within a single run
- If an agent does multiple searches, serialize them with delays
✅ JavaScript Evasion Injection
Inject before page load via browser act: evaluate:
// Override navigator.webdriver
Object.defineProperty(navigator, 'webdriver', { get: () => false });
// Fix User-Agent (remove HeadlessChrome)
// This needs to be done at browser launch, not page-level
Note: OpenClaw's browser tool may already handle some of this. Check if the Playwright launch args include --disable-blink-features=AutomationControlled.
⚠️ User-Agent Spoofing
- OpenClaw browser profiles support custom UA strings
- Set a realistic Chrome UA without
HeadlessChromemarker - Match the Chromium version actually being used
⚠️ Proxy Rotation
- Rotating residential proxies would solve IP-based CAPTCHA triggers
- Not currently configured in OpenClaw
- Cost: $5-50/month for residential proxy pools
- Recommendation: Only if CAPTCHA becomes a persistent blocker
❌ CAPTCHA Solving Services (Not Recommended)
- 2Captcha, CapSolver, etc. cost $1-3 per 1000 solves
- Adds latency (15-30 seconds per solve)
- Against DDG's ToS
- Our agents don't need this — rate limiting + profile persistence should suffice
Practical Recommendations for solo.engineer
Immediate (No code changes needed):
- Use
web_searchtool instead of browser for DDG searches — it hits DDG's backend API directly, no browser fingerprinting involved - Use
web_fetchfor known URLs — skip the search entirely when you know where to go - Keep persistent browser profiles — don't clear cookies between agent runs
Short-term (Minor config):
- Add delays in agent prompts — "Wait 3 seconds between DDG page loads"
- Use DDG lite/html endpoints when browser search is unavoidable:
https://lite.duckduckgo.com/lite/?q=... - Set realistic User-Agent on browser profiles
Medium-term (If CAPTCHA persists):
- Consider SearXNG self-hosted — acts as a meta-search proxy, handles DDG CAPTCHA internally
- Add Playwright launch flags:
--disable-blink-features=AutomationControlled - Inject stealth JS evasions before page loads
What NOT to do:
- Don't pay for CAPTCHA solving services — it's overkill for DDG
- Don't rotate IPs aggressively — DDG will flag the IP range
- Don't use headless mode detection bypass tools that are 3+ years unmaintained
Key Takeaway
The web_search tool already bypasses DDG CAPTCHA entirely because it uses DDG's API backend, not the browser frontend. The CAPTCHA issue only affects agents that navigate to duckduckgo.com in the browser. The fix is simple: use web_search for searching, web_fetch for fetching known pages, and only use the browser when you need JS rendering or login state.
If browser-based DDG access is truly needed, use the lite endpoint (lite.duckduckgo.com) with persistent profiles and rate limiting. DDG's CAPTCHA is among the easiest to avoid.
Sources: SearXNG issue #3927, Scrapfly stealth guide (2026-04-28), playwright-stealth PyPI, playwright-extra npm, Browserless docs