For the Harvesters · Data & APIs
Every machine-readable surface this archive exposes, in one place. All of it free, no API key, no rate limit, no terms wall — with worked examples for harvesting, citing, embedding, and subscribing.
Chapter I
Endpoints
Nine canonical surfaces. Every one is a static file generated at deploy time — fetch them as fast as you'd like, cache them as long as you'd like.
§ 1
OAI-PMH 2.0
The library-grade harvest endpoint. Suitable for DPLA, the Internet Archive, university catalog harvesters, and any digital-humanities tool that speaks OAI.
Identify: /api/oai-pmh/identify.xml
Verbs supported: Identify, ListMetadataFormats, ListIdentifiers, ListRecords, GetRecord. Format: oai_dc (Dublin Core). Repository identifier namespace: oai:filthylittleatheist.com:conway:vol-N:slug.
Because the responder is static, parameter handling (?from=, ?until=, resumptionToken) is not honored. Harvesters should pull the full ListRecords in a single pass.
§ 2
JSON: every work
One JSON document containing every work in the corpus, with title, slug, year, volume, category, URL, word count, and a link to the plain-text export. Stable shape: new works append, existing entries never change shape without a major-version bump.
curl https://filthylittleatheist.com/api/works.json | jq '.works[] | select(.year == 1872) | .title'§ 3
JSON: per-work downloads
For each of the 177 works, the byte sizes and URLs of every available export (TXT, JSON, BibTeX, RIS). Plus the per-volume bundles and the corpus-wide ZIP/MD bundles.
# Iterate every per-work TXT in one pass
curl -s https://filthylittleatheist.com/api/downloads.json \
| jq -r '.works[].downloads.txt' \
| xargs -I {} curl -s -O "https://filthylittleatheist.com{}"§ 4
JSON: work revision history
/api/work-history.json · per-work feeds at /api/work-history/<slug>.json
Per-work Git revision feed: every commit that touched a given transcription, with date, summary, author, and short SHA. Pair with the human-readable page at /works/<slug>/history/ to pin a citation to a specific revision.
§ 5
JSON: concordance
/api/concordance.json · per-work feeds at /api/concordance/<slug>.json
A phrase-level entity index across the corpus: 17 named entities (Talmage, Beecher, Lincoln, Whitman, and the rest) cross-referenced to every work that mentions them.
§ 6
JSON: glossary
Every glossary entry in machine-readable form: term, short definition, type (term / figure / event), aliases, slug.
§ 7
This day in Paine's life
/api/this-day.json (JSON Feed 1.1) · /this-day/feed.xml (Atom)
Every dated event from Paine's life, indexed by month-and-day so a daily-feed reader can serve "today in Paine" automatically.
§ 8
Atom feeds
/feed.xml— site-wide Atom (newest essays + announcements)./works/feed.xml— works-only Atom (newly-transcribed works)./this-day/feed.xml— Atom variant of the this-day feed.
§ 9
Sitemaps + robots
/sitemap.xml— XML sitemap, every page on the site, with<lastmod>,<changefreq>,<priority>./sitemap/— human-readable sitemap./robots.txt— points crawlers at the sitemap, no disallows.
§ 10
Search index — Pagefind
The full-text search index is built at deploy time and served from /pagefind/ on this same domain. To embed search in your own page, fetch /pagefind/pagefind.js directly; the API is documented at pagefind.app. No third-party search service is involved at any point.
Chapter II
Recipes
Worked examples — what to actually run when you want to do the common scholarly things with this corpus.
§ 11
Harvest the corpus
Three options, in increasing order of effort.
A. The corpus ZIP (one file)
Single archive containing all 177 works × 4 formats (TXT / JSON / BibTeX / RIS), plus a single concatenated Markdown bundle and a README. About 9 MB.
curl -O https://filthylittleatheist.com/api/downloads/conway-complete.zip
unzip conway-complete.zipB. OAI-PMH ListRecords (one round-trip)
Single Dublin Core dump of every work, suitable for DSpace, Omeka, or any OAI-aware repository.
curl -o records.xml https://filthylittleatheist.com/api/oai-pmh/list-records.xmlC. Source repository
Clone the public repo for the canonical Markdown source plus full Git history.
git clone https://github.com/jonajinga/filthy-little-atheist.git
cd the-great-agnostic/src/works
ls *.md # one file per work, with YAML front-matter§ 12
Cite a specific revision
For citations that need to pin to an exact text version (e.g. a footnote in a journal article), append the short commit hash from the work's history page:
Paine, Thomas "The Gods." The Works of Thomas Paine
(Conway Edition, 1900–1902). filthylittleatheist.com/works/the-gods/
[revision a1b2c3d, 2026-04-12].The hash resolves at https://github.com/jonajinga/filthy-little-atheist/commit/<hash>. Per-work history is at /works/<slug>/history/.
§ 13
Import into Zotero / EndNote / Mendeley
Every work ships an RIS export at /works/<slug>/downloads/<slug>.ris. Drag the file into Zotero, or use Zotero's "Import…" menu.
# Bulk-import every RIS file into a Zotero collection:
curl -s https://filthylittleatheist.com/api/downloads.json \
| jq -r '.works[].downloads.ris' \
| xargs -I {} curl -O "https://filthylittleatheist.com{}"For BibTeX users, swap .ris for .bib. The corpus-wide BibTeX file lives inside conway-complete.zip at the root.
§ 14
Subscribe to "this day in Paine"
Two formats. Pick whichever your reader prefers.
- Atom (RSS-style):
https://filthylittleatheist.com/this-day/feed.xml. Add to Feedly, Inoreader, NetNewsWire, Feedbin, or any modern reader. - JSON Feed 1.1:
https://filthylittleatheist.com/api/this-day.json. For readers that prefer JSON over XML.
The reader will surface the day's entry — a dated event from Paine's life — automatically.
§ 15
Embed full-text search
Pagefind's search index is at /pagefind/pagefind.js. Drop it into any page you want to search the corpus from:
<script type="module">
const pagefind = await import('https://filthylittleatheist.com/pagefind/pagefind.js');
const search = await pagefind.search('agnosticism');
for (const r of search.results) {
const data = await r.data();
console.log(data.url, data.meta.title);
}
</script>The index is fully static; no third-party search service involved. Pagefind's API reference: pagefind.app/docs/api/.
§ 16
Stylometric / corpus-linguistic work
The corpus is plain text in /api/works.json with word counts pre-computed. For a stylometric analysis (authorship, period drift, vocabulary frequency), you can pull every body in one pass:
import json, urllib.request
manifest = json.load(urllib.request.urlopen(
'https://filthylittleatheist.com/api/downloads.json'))
texts = []
for w in manifest['works']:
url = 'https://filthylittleatheist.com' + w['downloads']['txt']
texts.append((w['slug'], urllib.request.urlopen(url).read().decode()))
# texts is now a list of (slug, body) ready for sklearn / spaCy / nltkChapter III
Identifiers
How the archive names things — for citation, harvesting, and durability across host migrations.
§ 17
Per-work URLs
Each work ships at predictable paths:
/works/<slug>/— reading view/works/<slug>/history/— revision history/works/<slug>/downloads/<slug>.txt— plain text/works/<slug>/downloads/<slug>.json— structured JSON/works/<slug>/downloads/<slug>.bib— BibTeX/works/<slug>/downloads/<slug>.ris— RIS for Zotero / EndNote / Mendeley
§ 18
Stable identifiers
Every work carries a stable identifier in the conway:vol-N:slug namespace — for example conway:vol-3:about-the-holy-bible. Emitted in JSON-LD identifier, in the OAI-PMH record, and in the BibTeX export. Identifiers survive URL restructuring, host migration, and forks of the archive. Slugs are frozen and never repurposed.
§ 19
Structured data (JSON-LD)
Every page emits a single JSON-LD @graph covering: WebSite, Organization, Person (Paine, with Wikidata Q47484, VIAF 39377920, LCNAF n79071358 identifiers), Book (the Conway Edition with twelve typed Volume parts), and a per-page WebPage entity. Per-page-type schemas (CreativeWork, Quotation, Event, Place, FAQPage, BreadcrumbList, CollectionPage, DataCatalog, LearningResource, ImageGallery, VideoGallery) link back via stable @id IRIs.
§ 20
Authority files
Chapter IV
Working with the data
Terms, etiquette, and how to file a bug or request a new endpoint.
§ 21
Terms & rate limits
None. Every endpoint is a static file. Fetch as fast as you'd like, cache as long as you'd like. Paine's text is public domain (Public Domain Mark 1.0); the editorial commentary is CC BY-SA 4.0; the site code is MIT. Full breakdown at /license/.
§ 22
User-Agent etiquette
Long-running harvests should use a descriptive User-Agent string with a contact link, so I can route an eventual bug back to you:
User-Agent: ProjectName/1.0 (+https://example.org/about)§ 23
Caching + freshness
Every API file is built at deploy time and served from Cloudflare's CDN. Cache-Control: public, max-age=300 on the JSON endpoints — so subsequent fetches within five minutes return instantly from edge cache. The site rebuilds on every push to main; expect data to lag the canonical source by no more than a minute or two after a commit lands.
§ 24
Bug reports + new endpoints
Open an issue at github.com/jonajinga/filthy-little-atheist/issues or write to me. New endpoints land if there's a clear scholarly use case.