For the Harvesters Â· Data & APIs

Every machine-readable surface this archive exposes, in one place. All of it free, no API key, no rate limit, no terms wall â€” with worked examples for harvesting, citing, embedding, and subscribing.

Endpoints9
FormatJSON Â· XML Â· Atom
AuthNone
Rate limitNone
LicensePublic domain + CC BY-SA
OAI-PMH2.0

Chapter I

Endpoints

Nine canonical surfaces. Every one is a static file generated at deploy time â€” fetch them as fast as you'd like, cache them as long as you'd like.

Â§ 1

OAI-PMH 2.0

The library-grade harvest endpoint. Suitable for DPLA, the Internet Archive, university catalog harvesters, and any digital-humanities tool that speaks OAI.

Identify: /api/oai-pmh/identify.xml

Verbs supported: Identify, ListMetadataFormats, ListIdentifiers, ListRecords, GetRecord. Format: oai_dc (Dublin Core). Repository identifier namespace: oai:filthylittleatheist.com:conway:vol-N:slug.

Because the responder is static, parameter handling (?from=, ?until=, resumptionToken) is not honored. Harvesters should pull the full ListRecords in a single pass.

Â§ 2

JSON: every work

/api/works.json

One JSON document containing every work in the corpus, with title, slug, year, volume, category, URL, word count, and a link to the plain-text export. Stable shape: new works append, existing entries never change shape without a major-version bump.

curl https://filthylittleatheist.com/api/works.json | jq '.works[] | select(.year == 1872) | .title'

Â§ 3

JSON: per-work downloads

/api/downloads.json

For each of the 177 works, the byte sizes and URLs of every available export (TXT, JSON, BibTeX, RIS). Plus the per-volume bundles and the corpus-wide ZIP/MD bundles.

# Iterate every per-work TXT in one pass
curl -s https://filthylittleatheist.com/api/downloads.json \
  | jq -r '.works[].downloads.txt' \
  | xargs -I {} curl -s -O "https://filthylittleatheist.com{}"

Â§ 4

JSON: work revision history

/api/work-history.json Â· per-work feeds at /api/work-history/<slug>.json

Per-work Git revision feed: every commit that touched a given transcription, with date, summary, author, and short SHA. Pair with the human-readable page at /works/<slug>/history/ to pin a citation to a specific revision.

Â§ 5

JSON: concordance

/api/concordance.json Â· per-work feeds at /api/concordance/<slug>.json

A phrase-level entity index across the corpus: 17 named entities (Talmage, Beecher, Lincoln, Whitman, and the rest) cross-referenced to every work that mentions them.

Â§ 6

JSON: glossary

/api/glossary.json

Every glossary entry in machine-readable form: term, short definition, type (term / figure / event), aliases, slug.

Â§ 7

This day in Paine's life

/api/this-day.json (JSON Feed 1.1) Â· /this-day/feed.xml (Atom)

Every dated event from Paine's life, indexed by month-and-day so a daily-feed reader can serve "today in Paine" automatically.

Â§ 8

Atom feeds

/feed.xml â€” site-wide Atom (newest essays + announcements).
/works/feed.xml â€” works-only Atom (newly-transcribed works).
/this-day/feed.xml â€” Atom variant of the this-day feed.

Â§ 9

Sitemaps + robots

/sitemap.xml â€” XML sitemap, every page on the site, with <lastmod>, <changefreq>, <priority>.
/sitemap/ â€” human-readable sitemap.
/robots.txt â€” points crawlers at the sitemap, no disallows.

Â§ 10

Search index â€” Pagefind

The full-text search index is built at deploy time and served from /pagefind/ on this same domain. To embed search in your own page, fetch /pagefind/pagefind.js directly; the API is documented at pagefind.app. No third-party search service is involved at any point.

Chapter II

Recipes

Worked examples â€” what to actually run when you want to do the common scholarly things with this corpus.

Â§ 11

Harvest the corpus

Three options, in increasing order of effort.

A. The corpus ZIP (one file)

Single archive containing all 177 works Ã— 4 formats (TXT / JSON / BibTeX / RIS), plus a single concatenated Markdown bundle and a README. About 9 MB.

curl -O https://filthylittleatheist.com/api/downloads/conway-complete.zip
unzip conway-complete.zip

B. OAI-PMH ListRecords (one round-trip)

Single Dublin Core dump of every work, suitable for DSpace, Omeka, or any OAI-aware repository.

curl -o records.xml https://filthylittleatheist.com/api/oai-pmh/list-records.xml

C. Source repository

Clone the public repo for the canonical Markdown source plus full Git history.

git clone https://github.com/jonajinga/filthy-little-atheist.git
cd the-great-agnostic/src/works
ls *.md   # one file per work, with YAML front-matter

Â§ 12

Cite a specific revision

For citations that need to pin to an exact text version (e.g. a footnote in a journal article), append the short commit hash from the work's history page:

Paine, Thomas "The Gods." The Works of Thomas Paine
(Conway Edition, 1900â€“1902). filthylittleatheist.com/works/the-gods/
[revision a1b2c3d, 2026-04-12].

The hash resolves at https://github.com/jonajinga/filthy-little-atheist/commit/<hash>. Per-work history is at /works/<slug>/history/.

Â§ 13

Import into Zotero / EndNote / Mendeley

Every work ships an RIS export at /works/<slug>/downloads/<slug>.ris. Drag the file into Zotero, or use Zotero's "Importâ€¦" menu.

# Bulk-import every RIS file into a Zotero collection:
curl -s https://filthylittleatheist.com/api/downloads.json \
  | jq -r '.works[].downloads.ris' \
  | xargs -I {} curl -O "https://filthylittleatheist.com{}"

For BibTeX users, swap .ris for .bib. The corpus-wide BibTeX file lives inside conway-complete.zip at the root.

Â§ 14

Subscribe to "this day in Paine"

Two formats. Pick whichever your reader prefers.

Atom (RSS-style): https://filthylittleatheist.com/this-day/feed.xml. Add to Feedly, Inoreader, NetNewsWire, Feedbin, or any modern reader.
JSON Feed 1.1: https://filthylittleatheist.com/api/this-day.json. For readers that prefer JSON over XML.

The reader will surface the day's entry â€” a dated event from Paine's life â€” automatically.

Â§ 15

Embed full-text search

Pagefind's search index is at /pagefind/pagefind.js. Drop it into any page you want to search the corpus from:

<script type="module">
  const pagefind = await import('https://filthylittleatheist.com/pagefind/pagefind.js');
  const search = await pagefind.search('agnosticism');
  for (const r of search.results) {
    const data = await r.data();
    console.log(data.url, data.meta.title);
  }
</script>

The index is fully static; no third-party search service involved. Pagefind's API reference: pagefind.app/docs/api/.

Â§ 16

Stylometric / corpus-linguistic work

The corpus is plain text in /api/works.json with word counts pre-computed. For a stylometric analysis (authorship, period drift, vocabulary frequency), you can pull every body in one pass:

import json, urllib.request
manifest = json.load(urllib.request.urlopen(
  'https://filthylittleatheist.com/api/downloads.json'))
texts = []
for w in manifest['works']:
  url = 'https://filthylittleatheist.com' + w['downloads']['txt']
  texts.append((w['slug'], urllib.request.urlopen(url).read().decode()))
# texts is now a list of (slug, body) ready for sklearn / spaCy / nltk

Chapter III

Identifiers

How the archive names things â€” for citation, harvesting, and durability across host migrations.

Â§ 17

Per-work URLs

Each work ships at predictable paths:

/works/<slug>/ â€” reading view
/works/<slug>/history/ â€” revision history
/works/<slug>/downloads/<slug>.txt â€” plain text
/works/<slug>/downloads/<slug>.json â€” structured JSON
/works/<slug>/downloads/<slug>.bib â€” BibTeX
/works/<slug>/downloads/<slug>.ris â€” RIS for Zotero / EndNote / Mendeley

Â§ 18

Stable identifiers

Every work carries a stable identifier in the conway:vol-N:slug namespace â€” for example conway:vol-3:about-the-holy-bible. Emitted in JSON-LD identifier, in the OAI-PMH record, and in the BibTeX export. Identifiers survive URL restructuring, host migration, and forks of the archive. Slugs are frozen and never repurposed.

Â§ 19

Structured data (JSON-LD)

Every page emits a single JSON-LD @graph covering: WebSite, Organization, Person (Paine, with Wikidata Q47484, VIAF 39377920, LCNAF n79071358 identifiers), Book (the Conway Edition with twelve typed Volume parts), and a per-page WebPage entity. Per-page-type schemas (CreativeWork, Quotation, Event, Place, FAQPage, BreadcrumbList, CollectionPage, DataCatalog, LearningResource, ImageGallery, VideoGallery) link back via stable @id IRIs.

Â§ 20

Authority files

Wikidata Â· Q47484
VIAF Â· 39377920
LCNAF Â· n79071358

Chapter IV

Working with the data

Terms, etiquette, and how to file a bug or request a new endpoint.

Â§ 21

Terms & rate limits

None. Every endpoint is a static file. Fetch as fast as you'd like, cache as long as you'd like. Paine's text is public domain (Public Domain Mark 1.0); the editorial commentary is CC BY-SA 4.0; the site code is MIT. Full breakdown at /license/.

Â§ 22

User-Agent etiquette

Long-running harvests should use a descriptive User-Agent string with a contact link, so I can route an eventual bug back to you:

User-Agent: ProjectName/1.0 (+https://example.org/about)

Â§ 23

Caching + freshness

Every API file is built at deploy time and served from Cloudflare's CDN. Cache-Control: public, max-age=300 on the JSON endpoints â€” so subsequent fetches within five minutes return instantly from edge cache. The site rebuilds on every push to main; expect data to lag the canonical source by no more than a minute or two after a commit lands.

Â§ 24

Bug reports + new endpoints

Open an issue at github.com/jonajinga/filthy-little-atheist/issues or write to me. New endpoints land if there's a clear scholarly use case.