CLI

yoink stats

Analyze a saved crawl — page counts, depth distribution, top domains, content quality metrics.

docs/cli/stats.mdx·edit on github ↗·

yoink stats reads a crawl output file (JSON or JSONL) and prints a human-readable summary, with optional CSV / JSON export.

yoink stats FILE [OPTIONS]

Examples

# Human-readable summary
yoink stats crawl_output.jsonl
 
# Export to CSV for spreadsheet work
yoink stats crawl_output.jsonl --export stats.csv
 
# JSON output
yoink stats crawl_output.jsonl --json

Options

NameTypeDefaultDescription
FILE*argumentPath to a .json or .jsonl file produced by yoink crawl.
--export, -ePATHWrite summary stats to a CSV file at the given path.
--jsonFLAGOutput full stats as JSON instead of the formatted text summary.

What it computes

For every page in the file:

  • Total pages, total links, average links per page
  • Total text size and average text size (bytes)
  • Total HTML size if --save-html was used
  • Depth distribution — how many pages at each depth
  • Unique domains and top 10 domains by page count
  • Status code distribution
  • Content quality — share of pages with text, title, metadata
  • Text length stats — min / median / max characters

Sample output

============================================================
YOINK Crawl Statistics
============================================================

Total Pages: 87
Total Links: 1,243
Avg Links/Page: 14.29

Content Size:
  Total Text: 412.39 KB
  Avg Text/Page: 4.74 KB

Domains:
  Unique Domains: 1
  Top Domains:
    - docs.example.com: 87 pages

Depth Distribution:
  Depth 0:    1 #
  Depth 1:   24 ########################
  Depth 2:   62 ##############################################################

Content Quality:
  Pages with Text: 85 (97.7%)
  Pages with Title: 87 (100.0%)
  Pages with Metadata: 73 (83.9%)

Text Length:
  Min: 142 chars
  Median: 3,891 chars
  Max: 28,442 chars
============================================================

JSON output (--json)

{
  "total_pages": 87,
  "total_links": 1243,
  "avg_links_per_page": 14.29,
  "total_text_size": 422291,
  "max_depth": 2,
  "pages_by_depth": { "0": 1, "1": 24, "2": 62 },
  "unique_domains": 1,
  "top_domains": [{ "domain": "docs.example.com", "count": 87 }],
  "status_codes": { "200": 87 },
  "pages_with_text": 85,
  "pages_with_title": 87,
  "pages_with_metadata": 73,
  "text_length_min": 142,
  "text_length_median": 3891,
  "text_length_max": 28442
}

See also