Compare commits
2 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 461234e76e | |||
| 5de27503c5 |
@@ -0,0 +1,165 @@
|
|||||||
|
# AGENTS.md
|
||||||
|
|
||||||
|
## Project Summary
|
||||||
|
|
||||||
|
`ebookm` is a Rust workspace for compiling a set of Substack posts and local HTML files into a single EPUB.
|
||||||
|
|
||||||
|
Current workspace layout:
|
||||||
|
|
||||||
|
- `ebookm-core`
|
||||||
|
Core library: manifest parsing, source loading, extraction, normalization, TOC/link logic, EPUB generation.
|
||||||
|
- `ebookm-cli`
|
||||||
|
Thin CLI wrapper around `ebookm-core`.
|
||||||
|
|
||||||
|
Primary user workflow:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo run -p ebookm-cli -- build -m <manifest>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Files
|
||||||
|
|
||||||
|
- `Cargo.toml`
|
||||||
|
Workspace manifest.
|
||||||
|
- `ebookm-core/src/manifest.rs`
|
||||||
|
YAML manifest schema and defaults.
|
||||||
|
- `ebookm-core/src/source.rs`
|
||||||
|
Source loading for Substack URLs and local HTML files.
|
||||||
|
- `ebookm-core/src/extract.rs`
|
||||||
|
Metadata/body extraction, including Substack-specific selectors.
|
||||||
|
- `ebookm-core/src/normalize.rs`
|
||||||
|
HTML cleanup, local/remote image bundling, link rewriting, XHTML-safe output conversion.
|
||||||
|
- `ebookm-core/src/pipeline.rs`
|
||||||
|
Main build orchestration and chapter generation.
|
||||||
|
- `ebookm-core/src/epub.rs`
|
||||||
|
EPUB packaging, nav.xhtml and toc.ncx generation.
|
||||||
|
- `ebookm-core/src/template.rs`
|
||||||
|
Starter manifest template used by `ebookm init`.
|
||||||
|
- `README.md`
|
||||||
|
User-facing docs and manifest reference.
|
||||||
|
|
||||||
|
## Current Manifest Semantics
|
||||||
|
|
||||||
|
Top-level manifest keys:
|
||||||
|
|
||||||
|
- `book`
|
||||||
|
- `output`
|
||||||
|
- `defaults`
|
||||||
|
- `sections`
|
||||||
|
- `entries`
|
||||||
|
- `link_rules`
|
||||||
|
|
||||||
|
Supported source kinds:
|
||||||
|
|
||||||
|
- `substack`
|
||||||
|
Public Substack post URL.
|
||||||
|
- `html`
|
||||||
|
Local HTML file path, resolved relative to the manifest.
|
||||||
|
|
||||||
|
Important processing options:
|
||||||
|
|
||||||
|
- `defaults.processing.include_author`
|
||||||
|
- `defaults.processing.include_date`
|
||||||
|
- `defaults.processing.include_source_url`
|
||||||
|
- `defaults.processing.skip_first_paragraphs`
|
||||||
|
- per-entry overrides under `entries.<id>.processing`
|
||||||
|
|
||||||
|
Current defaults:
|
||||||
|
|
||||||
|
- `include_author: true`
|
||||||
|
- `include_date: true`
|
||||||
|
- `include_source_url: true`
|
||||||
|
- `skip_first_paragraphs: 0`
|
||||||
|
|
||||||
|
## Current EPUB Behavior
|
||||||
|
|
||||||
|
- Section structure is emitted into both `nav.xhtml` and `toc.ncx`.
|
||||||
|
- Chapter header content is configurable:
|
||||||
|
author, date, and canonical URL can each be independently shown/hidden.
|
||||||
|
- Local HTML images are bundled when `fetch_images: true`.
|
||||||
|
- Local image paths are resolved relative to the HTML file, not the manifest.
|
||||||
|
- Remote images from Substack pages are also bundled when `fetch_images: true`.
|
||||||
|
- Generated chapter XHTML is post-processed to self-close HTML void tags like `img`, `hr`, and `br` for EPUB/XML compatibility.
|
||||||
|
|
||||||
|
## Known Implementation Boundaries
|
||||||
|
|
||||||
|
- Substack handling is tuned to current public page structure, especially:
|
||||||
|
`.available-content .body.markup`
|
||||||
|
`.post-title`
|
||||||
|
JSON-LD `datePublished`
|
||||||
|
- Subscriber-only/authenticated Substack content is not implemented.
|
||||||
|
- CSS background images are not bundled.
|
||||||
|
- Manifest fields like `subtitle`, `summary`, and `tags` are parsed but only partially used.
|
||||||
|
- `rewrite_external_substack_links` and `preserve_other_external_links` exist in the manifest schema but are not deeply wired into behavior yet.
|
||||||
|
|
||||||
|
## Validation and Debugging
|
||||||
|
|
||||||
|
Run tests:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo test
|
||||||
|
```
|
||||||
|
|
||||||
|
Build a manifest:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo run -p ebookm-cli -- build -m ageofpeace/ageofpeace.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
Inspect extracted source metadata:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cargo run -p ebookm-cli -- inspect <url-or-file>
|
||||||
|
```
|
||||||
|
|
||||||
|
Validate generated XHTML quickly:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
unzip -p path/to/book.epub OEBPS/text/chapter.xhtml | xmllint --noout -
|
||||||
|
```
|
||||||
|
|
||||||
|
Validate the full EPUB package:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
epubcheck path/to/book.epub
|
||||||
|
```
|
||||||
|
|
||||||
|
Useful inspection commands:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
unzip -l path/to/book.epub
|
||||||
|
unzip -p path/to/book.epub OEBPS/nav.xhtml
|
||||||
|
unzip -p path/to/book.epub OEBPS/toc.ncx
|
||||||
|
unzip -p path/to/book.epub OEBPS/text/<entry>.xhtml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Existing Real Example
|
||||||
|
|
||||||
|
The repository contains a real working manifest:
|
||||||
|
|
||||||
|
- `ageofpeace/ageofpeace.yaml`
|
||||||
|
|
||||||
|
Related local content/assets:
|
||||||
|
|
||||||
|
- `ageofpeace/introduction.html`
|
||||||
|
- `ageofpeace/johngu.jpg`
|
||||||
|
- `ageofpeace/age_of_peace_cover.jpg`
|
||||||
|
|
||||||
|
This is the best regression case for:
|
||||||
|
|
||||||
|
- mixed local HTML + Substack sources
|
||||||
|
- cover image handling
|
||||||
|
- local image bundling
|
||||||
|
- section TOC nesting
|
||||||
|
- chapter-header processing options
|
||||||
|
|
||||||
|
## Guidance For Future Agents
|
||||||
|
|
||||||
|
- Preserve manifest backward compatibility unless there is a strong reason not to.
|
||||||
|
- If TOC behavior changes, verify both `nav.xhtml` and `toc.ncx`.
|
||||||
|
- If HTML normalization changes, verify generated XHTML with `xmllint`.
|
||||||
|
- If image handling changes, test both:
|
||||||
|
local HTML image references
|
||||||
|
remote Substack image references
|
||||||
|
- Prefer extending `ebookm-core` behavior and keeping `ebookm-cli` thin.
|
||||||
|
- Update `README.md` whenever user-facing manifest fields or behavior change.
|
||||||
@@ -43,7 +43,7 @@ entries:
|
|||||||
intro:
|
intro:
|
||||||
source:
|
source:
|
||||||
kind: "html"
|
kind: "html"
|
||||||
path: "ageofpeace/introduction.html"
|
path: "introduction.html"
|
||||||
contested_island:
|
contested_island:
|
||||||
source:
|
source:
|
||||||
kind: "substack"
|
kind: "substack"
|
||||||
@@ -70,6 +70,21 @@ entries:
|
|||||||
url: "https://ageofpeace.substack.com/p/biridana"
|
url: "https://ageofpeace.substack.com/p/biridana"
|
||||||
toc:
|
toc:
|
||||||
title: "Biridana"
|
title: "Biridana"
|
||||||
|
in_the_east:
|
||||||
|
source:
|
||||||
|
kind: "substack"
|
||||||
|
url: "https://ageofpeace.substack.com/p/in-the-east"
|
||||||
|
toc:
|
||||||
|
title: "In the East"
|
||||||
|
finale:
|
||||||
|
source:
|
||||||
|
kind: "substack"
|
||||||
|
url: "https://ageofpeace.substack.com/p/finale"
|
||||||
|
toc:
|
||||||
|
title: "Finale"
|
||||||
|
processing:
|
||||||
|
skip_first_paragraphs: 1
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
link_rules:
|
link_rules:
|
||||||
|
|||||||
Reference in New Issue
Block a user