JavaScript rendering
Use Playwright to render single-page apps and sites whose content lives behind JS execution.
The default Fetcher is a fast async HTTP client. It works perfectly for traditional sites, documentation, blogs, and most APIs. But if a site renders content client-side — React/Vue/Svelte SPAs, infinite-scroll pages, content that requires JS execution — you need a real browser.
That's what the Playwright fetcher is for.
Enable JS rendering
Install the optional browser extra and download a browser binary:
pip install "yoink[browser]"
playwright install chromiumThen turn it on with --render-js (CLI) or render_js=True (Python):
yoink crawl https://spa-site.com --render-jsconfig = CrawlConfig(
render_js=True,
browser_type="chromium", # or firefox, webkit
wait_strategy="networkidle", # or load, domcontentloaded, commit
headless=True,
)How it works
When render_js=True, create_fetcher() returns a PlaywrightFetcher instead of the standard HTTP Fetcher. The crawler is otherwise unchanged — same scheduler, same rate limiter, same robots checker.
render_js=FalseFetcher
- ▸aiohttp ClientSession
- ▸3 attempts, exponential backoff
- ▸fast, lean — the default
render_js=TruePlaywrightFetcher
- ▸launch browser (chromium / firefox / webkit)
- ▸borrow context from pool
- ▸page.goto(url) + wait_strategy
- ▸optional wait_for_selector
- ▸page.content() → html
- ▸release context back to pool
render_js=True + playwright missing → emits UserWarning and silently falls back to the HTTP path.
Wait strategies
Playwright's notion of "loaded" is different from a plain HTTP fetch. Pick the strategy that matches what you need:
| Name | Type | Default | Description |
|---|---|---|---|
| load | string | — | Wait for the load event. Equivalent to window.onload firing. |
| domcontentloaded | string | — | Wait for the DOM to parse. Doesn't wait for images, fonts, or stylesheets. |
| networkidle | string | default | Wait until there are no network connections for at least 500ms. Best for SPAs that fetch data after mount. |
| commit | string | — | Wait for navigation to commit (response headers received). Fastest, but content may not be ready. |
For sites that render content after networkidle (rare, but it happens), use a CSS selector to wait for a specific element:
yoink crawl https://spa.com --render-js --wait-selector ".article-content"config = CrawlConfig(
render_js=True,
wait_selector=".article-content",
)Browser pooling
Launching a browser is expensive. yoink reuses a pool of browser contexts (isolated cookie/localStorage scopes within a single browser process):
config = CrawlConfig(
render_js=True,
browser_pool_size=3, # default
)Workers borrow a context, render the page, and return it. Three contexts is a good default for max_concurrency=10 — enough that workers rarely block on the pool, few enough that memory stays reasonable.
Browser choice
| Browser | When to pick it |
|---|---|
| chromium | Default. Best site compatibility, fastest startup. |
| firefox | If you need to test against Firefox-specific behavior. |
| webkit | Closest approximation of Safari rendering. |
For data extraction, Chromium is almost always the right choice. The other engines exist for testing/cross-browser validation.
Debugging
Run with a visible browser to watch what's happening:
yoink crawl https://spa.com --render-js --no-headlessFor scripted runs that crash mysteriously, point Playwright at a screenshot directory:
config = CrawlConfig(
render_js=True,
screenshot_dir="./debug-screenshots",
)Each fetched page gets a PNG dropped in that directory, named screenshot_<8-char-md5>.png (e.g., screenshot_a1b2c3d4.png) where the 8 chars are the first 8 hex digits of the MD5 of the URL. Collisions are extremely rare in practice but possible on huge crawls.
Cost & throughput
JS rendering is 10–50× slower than plain HTTP fetching. A page that takes 200ms over HTTP might take 3–8 seconds with Playwright (network + render + wait). Plan accordingly:
- Lower
max_concurrency(try 5 instead of 20). - Use
wait_strategy="domcontentloaded"if you don't need post-mount data. - Keep
--render-jsoff for the parts of your crawl that don't need it. yoink doesn't (yet) auto-detect; that's a per-target decision.
When NOT to use it
If curl https://site.com returns the content you want, you don't need a browser. The default Fetcher is faster, lighter, and infinitely more reliable.
Try the HTTP fetcher first. Switch only when content is missing.
See also
CrawlConfig— full list of JS-related options.- The Playwright fetcher source: src/yoink/playwright_fetcher.py.