initial commit
This commit is contained in:
@@ -0,0 +1,482 @@
|
||||
# ebookm
|
||||
|
||||
`ebookm` is a Rust command-line tool that compiles a set of Substack posts and local HTML files into a single EPUB.
|
||||
|
||||
## Current Scope
|
||||
|
||||
`v0.1` supports:
|
||||
|
||||
- YAML manifests
|
||||
- Public Substack post URLs
|
||||
- Local HTML files
|
||||
- Manifest-defined section order and TOC structure
|
||||
- Per-entry metadata and TOC overrides
|
||||
- Basic internal link rewriting between included entries
|
||||
- EPUB generation with bundled article assets
|
||||
|
||||
## Build
|
||||
|
||||
```bash
|
||||
cargo build
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
Use the CLI through Cargo:
|
||||
|
||||
```bash
|
||||
cargo run -p ebookm-cli -- <command>
|
||||
```
|
||||
|
||||
Available commands:
|
||||
|
||||
- `build -m <manifest>`
|
||||
- `validate -m <manifest>`
|
||||
- `inspect <url-or-file>`
|
||||
- `init`
|
||||
|
||||
## Quick Start
|
||||
|
||||
This repository includes a runnable example manifest and local HTML fixture in `examples/`.
|
||||
|
||||
Validate the example manifest:
|
||||
|
||||
```bash
|
||||
cargo run -p ebookm-cli -- validate -m examples/example-book.yaml
|
||||
```
|
||||
|
||||
Build the example EPUB:
|
||||
|
||||
```bash
|
||||
cargo run -p ebookm-cli -- build -m examples/example-book.yaml
|
||||
```
|
||||
|
||||
The output EPUB will be written to:
|
||||
|
||||
```text
|
||||
examples/dist/example-book.epub
|
||||
```
|
||||
|
||||
Inspect a local HTML file:
|
||||
|
||||
```bash
|
||||
cargo run -p ebookm-cli -- inspect examples/articles/intro.html
|
||||
```
|
||||
|
||||
Generate a starter manifest:
|
||||
|
||||
```bash
|
||||
cargo run -p ebookm-cli -- init
|
||||
```
|
||||
|
||||
## Validate The Output EPUB
|
||||
|
||||
There are two useful levels of validation.
|
||||
|
||||
Quick XML/XHTML validation for generated chapter files:
|
||||
|
||||
```bash
|
||||
unzip -p path/to/book.epub OEBPS/text/chapter.xhtml | xmllint --noout -
|
||||
```
|
||||
|
||||
If you want to check every generated XHTML file in the EPUB:
|
||||
|
||||
```bash
|
||||
mkdir -p /tmp/ebookm-check
|
||||
unzip -o path/to/book.epub -d /tmp/ebookm-check
|
||||
find /tmp/ebookm-check/OEBPS -name '*.xhtml' -print -exec xmllint --noout {} \;
|
||||
```
|
||||
|
||||
Full EPUB validation:
|
||||
|
||||
Use `epubcheck`, which validates the EPUB package itself, including metadata, navigation files, manifest/spine consistency, and XHTML correctness.
|
||||
|
||||
```bash
|
||||
epubcheck path/to/book.epub
|
||||
```
|
||||
|
||||
Practical guidance:
|
||||
|
||||
- Use `xmllint` when you want to quickly confirm that generated XHTML is well-formed XML.
|
||||
- Use `epubcheck` when you want proper EPUB-level validation before distributing the file.
|
||||
- If an EPUB reader only shows part of a chapter, malformed XHTML is a common cause, so `xmllint` on the generated chapter files is a good first check.
|
||||
|
||||
## Manifest Reference
|
||||
|
||||
The top-level manifest keys are:
|
||||
|
||||
- `book`: EPUB metadata
|
||||
- `output`: output path and optional cover image
|
||||
- `defaults`: shared normalization and metadata defaults
|
||||
- `sections`: ordered TOC and reading-order groups
|
||||
- `entries`: source definitions and per-entry overrides
|
||||
- `link_rules`: cross-link rewriting behavior
|
||||
|
||||
Minimal example:
|
||||
|
||||
```yaml
|
||||
book:
|
||||
title: "My Book"
|
||||
author: "Editor"
|
||||
language: "en"
|
||||
identifier: "urn:uuid:my-book"
|
||||
|
||||
output:
|
||||
path: "dist/my-book.epub"
|
||||
|
||||
sections:
|
||||
- id: "part-1"
|
||||
title: "Part 1"
|
||||
entries:
|
||||
- "essay"
|
||||
|
||||
entries:
|
||||
essay:
|
||||
source:
|
||||
kind: "html"
|
||||
path: "articles/essay.html"
|
||||
|
||||
link_rules:
|
||||
mode: "auto"
|
||||
```
|
||||
|
||||
### Top-Level Fields
|
||||
|
||||
- `book`:
|
||||
EPUB metadata block. Required.
|
||||
- `output`:
|
||||
Output configuration block. Required.
|
||||
- `defaults`:
|
||||
Shared defaults applied across entries. Optional.
|
||||
- `sections`:
|
||||
Ordered list of sections. Required in practice for a useful build.
|
||||
- `entries`:
|
||||
Map of entry IDs to entry definitions. Required in practice.
|
||||
- `link_rules`:
|
||||
Global link rewriting policy. Optional.
|
||||
|
||||
### `book`
|
||||
|
||||
- `book.title`:
|
||||
Required string. Book title.
|
||||
- `book.author`:
|
||||
Optional string. Book-level author.
|
||||
- `book.language`:
|
||||
Optional string. Defaults to `en`.
|
||||
- `book.identifier`:
|
||||
Optional string. Defaults to a generated `urn:uuid:...`.
|
||||
- `book.description`:
|
||||
Optional string. Written into the EPUB package metadata.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
book:
|
||||
title: "Collected Essays"
|
||||
author: "Jane Doe"
|
||||
language: "en"
|
||||
identifier: "urn:uuid:collected-essays"
|
||||
description: "A single-volume EPUB generated by ebookm"
|
||||
```
|
||||
|
||||
### `output`
|
||||
|
||||
- `output.path`:
|
||||
Required string. Output EPUB path. Resolved relative to the manifest file.
|
||||
- `output.cover_image`:
|
||||
Optional string. Path to a cover image, resolved relative to the manifest file.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
output:
|
||||
path: "dist/book.epub"
|
||||
cover_image: "assets/cover.jpg"
|
||||
```
|
||||
|
||||
### `defaults`
|
||||
|
||||
- `defaults.fetch_images`:
|
||||
Optional boolean. Defaults to `true`. When enabled, image assets referenced from article HTML are fetched and bundled into the EPUB.
|
||||
- `defaults.normalize_substack_embeds`:
|
||||
Optional boolean. Defaults to `true`. Currently removes iframe embeds during normalization.
|
||||
- `defaults.metadata`:
|
||||
Optional metadata override block applied after extracted source metadata and before per-entry overrides.
|
||||
- `defaults.processing`:
|
||||
Optional shared article-processing and chapter-header defaults.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
defaults:
|
||||
fetch_images: true
|
||||
normalize_substack_embeds: true
|
||||
processing:
|
||||
include_author: true
|
||||
include_date: true
|
||||
include_source_url: true
|
||||
skip_first_paragraphs: 0
|
||||
metadata:
|
||||
author: "Editorial Team"
|
||||
```
|
||||
|
||||
### `defaults.processing` and `entries.<id>.processing`
|
||||
|
||||
These fields control chapter-header rendering and article trimming.
|
||||
|
||||
For `defaults.processing`, all fields are concrete values with defaults.
|
||||
For `entries.<id>.processing`, the same fields are optional overrides.
|
||||
|
||||
- `include_author`:
|
||||
Boolean. Defaults to `true`. Controls whether the extracted or overridden author name is shown at the start of the chapter.
|
||||
- `include_date`:
|
||||
Boolean. Defaults to `true`. Controls whether the extracted or overridden publication date is shown at the start of the chapter.
|
||||
- `include_source_url`:
|
||||
Boolean. Defaults to `true`. Controls whether the canonical article URL is shown at the start of the chapter.
|
||||
- `skip_first_paragraphs`:
|
||||
Integer. Defaults to `0`. Removes the first `n` paragraph elements from the extracted article body before EPUB generation.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
defaults:
|
||||
processing:
|
||||
include_author: true
|
||||
include_date: false
|
||||
include_source_url: false
|
||||
skip_first_paragraphs: 0
|
||||
```
|
||||
|
||||
### `defaults.metadata` and `entries.<id>.metadata`
|
||||
|
||||
These fields use the same shape:
|
||||
|
||||
- `author`:
|
||||
Optional string.
|
||||
- `published`:
|
||||
Optional date in `YYYY-MM-DD` format.
|
||||
- `subtitle`:
|
||||
Optional string. Accepted by the parser but not yet emitted into the EPUB output.
|
||||
- `summary`:
|
||||
Optional string. Accepted by the parser but not yet emitted into the EPUB output.
|
||||
- `tags`:
|
||||
Optional list of strings. Accepted by the parser but not yet used in link or EPUB output logic.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
author: "Jane Doe"
|
||||
published: "2025-01-10"
|
||||
subtitle: "Notebook entry"
|
||||
summary: "A short summary"
|
||||
tags: ["essay", "history"]
|
||||
```
|
||||
|
||||
### `sections`
|
||||
|
||||
`sections` is an ordered list. Section order controls reading order and TOC grouping.
|
||||
|
||||
Each section supports:
|
||||
|
||||
- `id`:
|
||||
Required string. Stable section identifier.
|
||||
- `title`:
|
||||
Required string. Section title shown in the TOC.
|
||||
- `entries`:
|
||||
Optional list of entry IDs in reading order. Usually should not be empty.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
sections:
|
||||
- id: "part-1"
|
||||
title: "Part 1"
|
||||
entries:
|
||||
- "opening-post"
|
||||
- "notes"
|
||||
```
|
||||
|
||||
### `entries`
|
||||
|
||||
`entries` is a map from entry ID to entry definition. Entry IDs are referenced from `sections` and link rules.
|
||||
|
||||
Each entry supports:
|
||||
|
||||
- `source`:
|
||||
Required source definition block.
|
||||
- `title`:
|
||||
Optional string. Overrides the extracted article title.
|
||||
- `metadata`:
|
||||
Optional metadata override block.
|
||||
- `toc`:
|
||||
Optional TOC override block.
|
||||
- `links`:
|
||||
Optional per-entry link-policy block.
|
||||
- `processing`:
|
||||
Optional per-entry processing override block.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
entries:
|
||||
opening-post:
|
||||
source:
|
||||
kind: "substack"
|
||||
url: "https://example.substack.com/p/opening-post"
|
||||
title: "Opening Post"
|
||||
metadata:
|
||||
published: "2025-01-10"
|
||||
processing:
|
||||
include_source_url: false
|
||||
skip_first_paragraphs: 1
|
||||
toc:
|
||||
title: "Introduction"
|
||||
links:
|
||||
mode: "explicit"
|
||||
allow_to: ["notes"]
|
||||
```
|
||||
|
||||
### `entries.<id>.source`
|
||||
|
||||
Two source kinds are supported:
|
||||
|
||||
- `kind: "substack"`:
|
||||
Use a public Substack article URL.
|
||||
Fields:
|
||||
`url` required string.
|
||||
- `kind: "html"`:
|
||||
Use a local HTML file.
|
||||
Fields:
|
||||
`path` required string, resolved relative to the manifest file.
|
||||
|
||||
Examples:
|
||||
|
||||
```yaml
|
||||
source:
|
||||
kind: "substack"
|
||||
url: "https://example.substack.com/p/my-post"
|
||||
```
|
||||
|
||||
```yaml
|
||||
source:
|
||||
kind: "html"
|
||||
path: "articles/local-post.html"
|
||||
```
|
||||
|
||||
You can mix Substack URLs and local HTML files in the same manifest:
|
||||
|
||||
```yaml
|
||||
entries:
|
||||
remote-post:
|
||||
source:
|
||||
kind: "substack"
|
||||
url: "https://example.substack.com/p/remote-post"
|
||||
|
||||
local-post:
|
||||
source:
|
||||
kind: "html"
|
||||
path: "articles/local-post.html"
|
||||
```
|
||||
|
||||
### `entries.<id>.toc`
|
||||
|
||||
- `title`:
|
||||
Optional string. Overrides the chapter label used in the TOC.
|
||||
- `hidden`:
|
||||
Optional boolean. Defaults to `false`. When `true`, the entry is omitted from the TOC.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
toc:
|
||||
title: "Appendix"
|
||||
hidden: false
|
||||
```
|
||||
|
||||
### `entries.<id>.links`
|
||||
|
||||
- `mode`:
|
||||
Optional string. One of:
|
||||
`auto`, `explicit`, `none`.
|
||||
If omitted, the global `link_rules.mode` is used.
|
||||
- `allow_to`:
|
||||
Optional list of entry IDs. If set, rewritten internal links are limited to these targets.
|
||||
- `block_to`:
|
||||
Optional list of entry IDs. These targets are excluded from rewriting.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
links:
|
||||
mode: "explicit"
|
||||
allow_to: ["intro", "appendix"]
|
||||
block_to: ["draft-notes"]
|
||||
```
|
||||
|
||||
### `link_rules`
|
||||
|
||||
- `link_rules.mode`:
|
||||
Optional string. One of:
|
||||
`auto`, `explicit`, `none`.
|
||||
Defaults to `auto`.
|
||||
- `link_rules.rewrite_external_substack_links`:
|
||||
Optional boolean. Defaults to `true`. Accepted by the manifest parser, but not currently used to change behavior in `v0.1`.
|
||||
- `link_rules.preserve_other_external_links`:
|
||||
Optional boolean. Defaults to `true`. Accepted by the manifest parser, but not currently used to change behavior in `v0.1`.
|
||||
- `link_rules.rules`:
|
||||
Optional list of explicit link rules.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
link_rules:
|
||||
mode: "explicit"
|
||||
rewrite_external_substack_links: true
|
||||
preserve_other_external_links: true
|
||||
rules:
|
||||
- from: ["notes"]
|
||||
to: ["intro"]
|
||||
match_mode: "canonical-url"
|
||||
```
|
||||
|
||||
### `link_rules.rules[]`
|
||||
|
||||
Each rule supports:
|
||||
|
||||
- `from`:
|
||||
Required list of selectors describing where the rule applies.
|
||||
- `to`:
|
||||
Required list of selectors describing eligible targets.
|
||||
- `match_mode`:
|
||||
Optional string. One of:
|
||||
`canonical-url`, `source-url`, `disabled`.
|
||||
Defaults to `canonical-url`.
|
||||
|
||||
Supported selectors in `from` and `to`:
|
||||
|
||||
- `*`:
|
||||
Match all entries.
|
||||
- `<entry-id>`:
|
||||
Match one entry by ID.
|
||||
- `section:<section-id>`:
|
||||
Match all entries referenced by that section.
|
||||
|
||||
Example:
|
||||
|
||||
```yaml
|
||||
link_rules:
|
||||
mode: "explicit"
|
||||
rules:
|
||||
- from: ["section:essays"]
|
||||
to: ["section:essays"]
|
||||
match_mode: "canonical-url"
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- Output paths are resolved relative to the manifest file location.
|
||||
- Local HTML paths are also resolved relative to the manifest file location.
|
||||
- `sections` and `entries` are deserialized with empty defaults, but `validate` and `build` expect them to be meaningfully populated.
|
||||
- `subtitle`, `summary`, `tags`, `rewrite_external_substack_links`, and `preserve_other_external_links` are accepted today but only partially wired into runtime behavior.
|
||||
- For Substack sources, `v0.1` assumes public posts. Subscriber-only/session-based fetching is not implemented.
|
||||
Reference in New Issue
Block a user