# yoink > The public data crawler. A fast, async Python web crawler purpose-built for AI-ready data extraction — clean text via trafilatura, JSON/JSONL/Parquet output, robots.txt compliance, optional JavaScript rendering, and resumable S3-backed checkpoints. Yoink is a CLI plus a Python library. Most users only need the CLI: `yoink crawl --depth 2 --max-pages 200 -o crawl.jsonl`. ## Getting started - [Introduction](https://yoink.goatsquadstudios.com/docs.md/introduction): What yoink is and why it exists - [Installation](https://yoink.goatsquadstudios.com/docs.md/installation): Install yoink and its optional extras - [Quickstart](https://yoink.goatsquadstudios.com/docs.md/quickstart): Yoink your first website in 30 seconds ## Concepts - [Architecture](https://yoink.goatsquadstudios.com/docs.md/concepts/architecture): How the pieces fit together - [Rate limiting](https://yoink.goatsquadstudios.com/docs.md/concepts/rate-limiting): Per-domain token bucket and crawl-delay - [robots.txt](https://yoink.goatsquadstudios.com/docs.md/concepts/robots-txt): Compliance, parsing, and matching - [JavaScript rendering](https://yoink.goatsquadstudios.com/docs.md/concepts/javascript-rendering): Crawling SPAs with Playwright - [Checkpointing](https://yoink.goatsquadstudios.com/docs.md/concepts/checkpointing): Resumable crawls with local or S3 storage - [URL filtering](https://yoink.goatsquadstudios.com/docs.md/concepts/url-filtering): Include, exclude, extension, and domain filters ## CLI - [yoink crawl](https://yoink.goatsquadstudios.com/docs.md/cli/crawl): All flags for the primary command - [yoink stats](https://yoink.goatsquadstudios.com/docs.md/cli/stats): Analyze a saved crawl - [yoink version](https://yoink.goatsquadstudios.com/docs.md/cli/version): Show version info ## Python API - [Crawler](https://yoink.goatsquadstudios.com/docs.md/api/crawler) - [CrawlConfig](https://yoink.goatsquadstudios.com/docs.md/api/config) - [Page](https://yoink.goatsquadstudios.com/docs.md/api/page) - [CheckpointManager](https://yoink.goatsquadstudios.com/docs.md/api/checkpoint) - [Filters](https://yoink.goatsquadstudios.com/docs.md/api/filters) - [Storage backends](https://yoink.goatsquadstudios.com/docs.md/api/storage) - [Stats](https://yoink.goatsquadstudios.com/docs.md/api/stats) - [Writers](https://yoink.goatsquadstudios.com/docs.md/api/writers) ## Examples - [Basic crawl](https://yoink.goatsquadstudios.com/docs.md/examples/basic) - [AI training data](https://yoink.goatsquadstudios.com/docs.md/examples/ai-training) - [Checkpoint & resume](https://yoink.goatsquadstudios.com/docs.md/examples/checkpoint-resume) - [Lambda + S3 checkpoints](https://yoink.goatsquadstudios.com/docs.md/examples/lambda-s3) - [Custom extraction](https://yoink.goatsquadstudios.com/docs.md/examples/custom-extraction) ## Reference - [Output formats](https://yoink.goatsquadstudios.com/docs.md/reference/output-formats) - [Configuration](https://yoink.goatsquadstudios.com/docs.md/reference/configuration) ## For AI assistants - [Full docs concatenated as markdown](https://yoink.goatsquadstudios.com/llms-full.txt): single artifact for direct paste into an LLM context - [Claude Code skill (SKILL.md)](https://yoink.goatsquadstudios.com/agents/yoink/SKILL.md): drop in `~/.claude/skills/yoink/` - [Cursor rules (cursor.mdc)](https://yoink.goatsquadstudios.com/agents/yoink/cursor.mdc): drop in `.cursor/rules/yoink.mdc` - [Windsurf rules](https://yoink.goatsquadstudios.com/agents/yoink/.windsurfrules): drop at project root as `.windsurfrules` - [AGENTS.md (Codex / generic)](https://yoink.goatsquadstudios.com/agents/yoink/AGENTS.md): drop at project root