2026-05-25 17:16:16 +02:00
2026-05-25 17:16:16 +02:00
2026-05-25 17:05:15 +02:00
2026-05-25 17:05:15 +02:00
2026-05-25 17:05:15 +02:00
2026-05-25 17:05:15 +02:00
2026-05-25 17:13:27 +02:00
2026-05-25 17:05:15 +02:00
2026-05-25 17:05:15 +02:00
2026-05-25 17:05:15 +02:00

ebookm

ebookm is a Rust command-line tool that compiles a set of Substack posts and local HTML files into a single EPUB.

Current Scope

v0.1 supports:

  • YAML manifests
  • Public Substack post URLs
  • Local HTML files
  • Manifest-defined section order and TOC structure
  • Per-entry metadata and TOC overrides
  • Basic internal link rewriting between included entries
  • EPUB generation with bundled article assets

Build

cargo build

Run

Use the CLI through Cargo:

cargo run -p ebookm-cli -- <command>

Available commands:

  • build -m <manifest>
  • validate -m <manifest>
  • inspect <url-or-file>
  • init

Quick Start

This repository includes a runnable example manifest and local HTML fixture in examples/.

Validate the example manifest:

cargo run -p ebookm-cli -- validate -m examples/example-book.yaml

Build the example EPUB:

cargo run -p ebookm-cli -- build -m examples/example-book.yaml

The output EPUB will be written to:

examples/dist/example-book.epub

Inspect a local HTML file:

cargo run -p ebookm-cli -- inspect examples/articles/intro.html

Generate a starter manifest:

cargo run -p ebookm-cli -- init

Validate The Output EPUB

There are two useful levels of validation.

Quick XML/XHTML validation for generated chapter files:

unzip -p path/to/book.epub OEBPS/text/chapter.xhtml | xmllint --noout -

If you want to check every generated XHTML file in the EPUB:

mkdir -p /tmp/ebookm-check
unzip -o path/to/book.epub -d /tmp/ebookm-check
find /tmp/ebookm-check/OEBPS -name '*.xhtml' -print -exec xmllint --noout {} \;

Full EPUB validation:

Use epubcheck, which validates the EPUB package itself, including metadata, navigation files, manifest/spine consistency, and XHTML correctness.

epubcheck path/to/book.epub

Practical guidance:

  • Use xmllint when you want to quickly confirm that generated XHTML is well-formed XML.
  • Use epubcheck when you want proper EPUB-level validation before distributing the file.
  • If an EPUB reader only shows part of a chapter, malformed XHTML is a common cause, so xmllint on the generated chapter files is a good first check.

Manifest Reference

The top-level manifest keys are:

  • book: EPUB metadata
  • output: output path and optional cover image
  • defaults: shared normalization and metadata defaults
  • sections: ordered TOC and reading-order groups
  • entries: source definitions and per-entry overrides
  • link_rules: cross-link rewriting behavior

Minimal example:

book:
  title: "My Book"
  author: "Editor"
  language: "en"
  identifier: "urn:uuid:my-book"

output:
  path: "dist/my-book.epub"

sections:
  - id: "part-1"
    title: "Part 1"
    entries:
      - "essay"

entries:
  essay:
    source:
      kind: "html"
      path: "articles/essay.html"

link_rules:
  mode: "auto"

Top-Level Fields

  • book: EPUB metadata block. Required.
  • output: Output configuration block. Required.
  • defaults: Shared defaults applied across entries. Optional.
  • sections: Ordered list of sections. Required in practice for a useful build.
  • entries: Map of entry IDs to entry definitions. Required in practice.
  • link_rules: Global link rewriting policy. Optional.

book

  • book.title: Required string. Book title.
  • book.author: Optional string. Book-level author.
  • book.language: Optional string. Defaults to en.
  • book.identifier: Optional string. Defaults to a generated urn:uuid:....
  • book.description: Optional string. Written into the EPUB package metadata.

Example:

book:
  title: "Collected Essays"
  author: "Jane Doe"
  language: "en"
  identifier: "urn:uuid:collected-essays"
  description: "A single-volume EPUB generated by ebookm"

output

  • output.path: Required string. Output EPUB path. Resolved relative to the manifest file.
  • output.cover_image: Optional string. Path to a cover image, resolved relative to the manifest file.

Example:

output:
  path: "dist/book.epub"
  cover_image: "assets/cover.jpg"

defaults

  • defaults.fetch_images: Optional boolean. Defaults to true. When enabled, image assets referenced from article HTML are fetched and bundled into the EPUB.
  • defaults.normalize_substack_embeds: Optional boolean. Defaults to true. Currently removes iframe embeds during normalization.
  • defaults.metadata: Optional metadata override block applied after extracted source metadata and before per-entry overrides.
  • defaults.processing: Optional shared article-processing and chapter-header defaults.

Example:

defaults:
  fetch_images: true
  normalize_substack_embeds: true
  processing:
    include_author: true
    include_date: true
    include_source_url: true
    skip_first_paragraphs: 0
  metadata:
    author: "Editorial Team"

defaults.processing and entries.<id>.processing

These fields control chapter-header rendering and article trimming.

For defaults.processing, all fields are concrete values with defaults. For entries.<id>.processing, the same fields are optional overrides.

  • include_author: Boolean. Defaults to true. Controls whether the extracted or overridden author name is shown at the start of the chapter.
  • include_date: Boolean. Defaults to true. Controls whether the extracted or overridden publication date is shown at the start of the chapter.
  • include_source_url: Boolean. Defaults to true. Controls whether the canonical article URL is shown at the start of the chapter.
  • skip_first_paragraphs: Integer. Defaults to 0. Removes the first n paragraph elements from the extracted article body before EPUB generation.

Example:

defaults:
  processing:
    include_author: true
    include_date: false
    include_source_url: false
    skip_first_paragraphs: 0

defaults.metadata and entries.<id>.metadata

These fields use the same shape:

  • author: Optional string.
  • published: Optional date in YYYY-MM-DD format.
  • subtitle: Optional string. Accepted by the parser but not yet emitted into the EPUB output.
  • summary: Optional string. Accepted by the parser but not yet emitted into the EPUB output.
  • tags: Optional list of strings. Accepted by the parser but not yet used in link or EPUB output logic.

Example:

metadata:
  author: "Jane Doe"
  published: "2025-01-10"
  subtitle: "Notebook entry"
  summary: "A short summary"
  tags: ["essay", "history"]

sections

sections is an ordered list. Section order controls reading order and TOC grouping.

Each section supports:

  • id: Required string. Stable section identifier.
  • title: Required string. Section title shown in the TOC.
  • entries: Optional list of entry IDs in reading order. Usually should not be empty.

Example:

sections:
  - id: "part-1"
    title: "Part 1"
    entries:
      - "opening-post"
      - "notes"

entries

entries is a map from entry ID to entry definition. Entry IDs are referenced from sections and link rules.

Each entry supports:

  • source: Required source definition block.
  • title: Optional string. Overrides the extracted article title.
  • metadata: Optional metadata override block.
  • toc: Optional TOC override block.
  • links: Optional per-entry link-policy block.
  • processing: Optional per-entry processing override block.

Example:

entries:
  opening-post:
    source:
      kind: "substack"
      url: "https://example.substack.com/p/opening-post"
    title: "Opening Post"
    metadata:
      published: "2025-01-10"
    processing:
      include_source_url: false
      skip_first_paragraphs: 1
    toc:
      title: "Introduction"
    links:
      mode: "explicit"
      allow_to: ["notes"]

entries.<id>.source

Two source kinds are supported:

  • kind: "substack": Use a public Substack article URL. Fields: url required string.
  • kind: "html": Use a local HTML file. Fields: path required string, resolved relative to the manifest file.

Examples:

source:
  kind: "substack"
  url: "https://example.substack.com/p/my-post"
source:
  kind: "html"
  path: "articles/local-post.html"

You can mix Substack URLs and local HTML files in the same manifest:

entries:
  remote-post:
    source:
      kind: "substack"
      url: "https://example.substack.com/p/remote-post"

  local-post:
    source:
      kind: "html"
      path: "articles/local-post.html"

entries.<id>.toc

  • title: Optional string. Overrides the chapter label used in the TOC.
  • hidden: Optional boolean. Defaults to false. When true, the entry is omitted from the TOC.

Example:

toc:
  title: "Appendix"
  hidden: false
  • mode: Optional string. One of: auto, explicit, none. If omitted, the global link_rules.mode is used.
  • allow_to: Optional list of entry IDs. If set, rewritten internal links are limited to these targets.
  • block_to: Optional list of entry IDs. These targets are excluded from rewriting.

Example:

links:
  mode: "explicit"
  allow_to: ["intro", "appendix"]
  block_to: ["draft-notes"]
  • link_rules.mode: Optional string. One of: auto, explicit, none. Defaults to auto.
  • link_rules.rewrite_external_substack_links: Optional boolean. Defaults to true. Accepted by the manifest parser, but not currently used to change behavior in v0.1.
  • link_rules.preserve_other_external_links: Optional boolean. Defaults to true. Accepted by the manifest parser, but not currently used to change behavior in v0.1.
  • link_rules.rules: Optional list of explicit link rules.

Example:

link_rules:
  mode: "explicit"
  rewrite_external_substack_links: true
  preserve_other_external_links: true
  rules:
    - from: ["notes"]
      to: ["intro"]
      match_mode: "canonical-url"

Each rule supports:

  • from: Required list of selectors describing where the rule applies.
  • to: Required list of selectors describing eligible targets.
  • match_mode: Optional string. One of: canonical-url, source-url, disabled. Defaults to canonical-url.

Supported selectors in from and to:

  • *: Match all entries.
  • <entry-id>: Match one entry by ID.
  • section:<section-id>: Match all entries referenced by that section.

Example:

link_rules:
  mode: "explicit"
  rules:
    - from: ["section:essays"]
      to: ["section:essays"]
      match_mode: "canonical-url"

Notes

  • Output paths are resolved relative to the manifest file location.
  • Local HTML paths are also resolved relative to the manifest file location.
  • sections and entries are deserialized with empty defaults, but validate and build expect them to be meaningfully populated.
  • subtitle, summary, tags, rewrite_external_substack_links, and preserve_other_external_links are accepted today but only partially wired into runtime behavior.
  • For Substack sources, v0.1 assumes public posts. Subscriber-only/session-based fetching is not implemented.
S
Description
make an .epub out of substack posts
Readme 1.1 MiB
Languages
Rust 96.9%
HTML 3.1%