Skip to content
yomi

CLI reference

Every yomi command and flag.

yomi [command] [flags]

Six commands: read turns one page into Markdown, site reads a whole site into a folder, pack bundles a whole site into one SQLite database or ZIM archive, meta prints a page's metadata as JSON, links lists a page's outbound links, and serve previews a folder of Markdown. Run yomi <command> --help for the canonical, up-to-date list.

The read flags in the last section are shared: they apply to read, site, meta, and links, since each fetches and extracts a page before doing its job.

yomi read

yomi read <url> [flags]

Reads one page into Markdown, printing to stdout or writing to a file. Fetches the page, renders it in headless Chrome only when needed, extracts the main content, and converts it to GitHub-Flavored Markdown with a YAML front-matter header.

Flag Default Meaning
-o, --out stdout Write the Markdown to a file instead of stdout

Plus the shared read flags.

yomi site

yomi site <url> [flags]

Crawls a whole site breadth-first from the seed URL, reading each in-scope page into Markdown. The default output is a folder of .md files mirroring the URL paths, with a SUMMARY.md table of contents and a shared media/ folder. --single assembles one combined file instead.

Output

Flag Default Meaning
-o, --out host name Output folder, or file path with --single
-s, --single false Assemble one combined .md file with a table of contents and per-page sections

Scope

Flag Default Meaning
-p, --max-pages 0 Stop after N pages (0 = unlimited)
-d, --max-depth 0 Link-follow depth cap (0 = unlimited)
--subdomains false Treat subdomains of the seed host as in scope
--scope-prefix Only crawl pages whose path starts with this prefix
--exclude Path prefixes to skip (repeatable)
--no-robots false Ignore robots.txt

Concurrency

Flag Default Meaning
--workers 4 Concurrent page workers

Plus the shared read flags, applied to every page in the crawl.

yomi pack

yomi pack <url> [flags]

Crawls a whole site and bundles it into one file: a SQLite database of pages, links, and images (the default), or a ZIM offline archive you can open in Kiwix. The crawl is backed by the database, so it resumes where it left off and a later run only fetches what is new.

An explicit --format wins; otherwise the output extension picks the format, so -o site.zim builds a ZIM and -o site.db builds a database without a --format flag.

Output

Flag Default Meaning
--format sqlite Output format: sqlite or zim
-o, --out <host>.db or .zim Output file
--state the output with .db SQLite store path for a ZIM build (the resumable sidecar)

Resume and refresh

Flag Default Meaning
--refresh false Re-fetch every page, ignoring what is already stored
--max-age 0 Re-fetch a stored page older than this duration (e.g. 24h; 0 = never)

ZIM metadata

Flag Default Meaning
--title the home page title Archive title
--description Archive description
--language eng Archive language as an ISO 639-3 code
--date today (UTC) Archive date (YYYY-MM-DD)
--icon a built-in icon Path to a 48x48 PNG shown as the Kiwix library tile
--no-compress false Store every entry raw, with no compression

Scope and concurrency

pack takes the same scope, limit, worker, and robots flags as yomi site: -p, --max-pages, -d, --max-depth, --subdomains, --scope-prefix, --exclude, --workers, and --no-robots.

Plus the shared read flags, applied to every page in the crawl.

yomi meta

yomi meta <url> [flags]

Prints the page's metadata record as JSON, without the Markdown body. The record carries url, title, byline, site_name, excerpt, lang, published, fetched, word_count, reading_time (whole minutes), and rendered, followed by links and images as arrays of objects (each link is {text, url, internal}, each image {alt, url, local}). Only non-empty fields appear. Takes the shared read flags.

yomi links <url> [flags]

Lists the outbound links found in the page's article body, one URL per line.

Flag Default Meaning
--json false Emit the links as JSON instead of one URL per line

Plus the shared read flags.

yomi serve

yomi serve [dir] [flags]

Runs a local static file server over a folder of Markdown. With no dir, serves the current directory.

Flag Default Meaning
-a, --addr 127.0.0.1:8800 Address to listen on

Shared read flags

These apply to read, site, meta, and links.

Fetching and rendering

Flag Default Meaning
--render auto auto static-fetches first and renders in headless Chrome only when the page looks JavaScript-gated; on always renders; off never launches a browser
--scroll false Auto-scroll the page in render mode to trigger lazy loading
--timeout 30s Per-request timeout
--user-agent a desktop Chrome string User-Agent for page and asset fetches; the default is a real browser string so a page returns the HTML a person would see
--chrome Path to the Chrome/Chromium binary
--control-url Attach to a running Chrome DevTools endpoint

Extraction and output

Flag Default Meaning
--links inline Link style: inline ([text](url)) or reference (definitions at the bottom)
--no-front-matter false Omit the YAML front-matter header
--title-heading false Keep the title as an H1 at the top of the body
--wrap 0 Hard-wrap prose at column N (0 = no wrap)
-q, --quiet false Suppress progress output

Images

Flag Default Meaning
--images remote remote leaves image URLs absolute; download fetches images next to the output and rewrites to relative paths; inline embeds images as base64 data URIs
--max-image-mb 16 Skip images larger than this; leave them at their remote URL

Front-matter fields

read and site write a YAML front-matter header on each Markdown file. The fields appear in this fixed order, and only non-empty ones are written:

Field Meaning
title Page title
url Source URL
site Site name
byline Author
published Published date
fetched When yomi read the page
lang Content language
word_count Words in the extracted article
reading_time Estimated reading time, in whole minutes

String values are quoted; word_count and reading_time are written as plain numbers.