0%

For the Harvesters · Data & APIs

Every machine-readable surface this archive exposes, in one place. All of it free, no API key, no rate limit, no terms wall — with worked examples for harvesting, citing, embedding, and subscribing.

  • Endpoints9
  • FormatJSON · XML · Atom
  • AuthNone
  • Rate limitNone
  • LicensePublic domain + CC BY-SA
  • OAI-PMH2.0

Chapter I

Endpoints

Nine canonical surfaces. Every one is a static file generated at deploy time — fetch them as fast as you'd like, cache them as long as you'd like.

§ 1

OAI-PMH 2.0

The library-grade harvest endpoint. Suitable for DPLA, the Internet Archive, university catalog harvesters, and any digital-humanities tool that speaks OAI.

Identify: /api/oai-pmh/identify.xml

Verbs supported: Identify, ListMetadataFormats, ListIdentifiers, ListRecords, GetRecord. Format: oai_dc (Dublin Core). Repository identifier namespace: oai:filthylittleatheist.com:conway:vol-N:slug.

Because the responder is static, parameter handling (?from=, ?until=, resumptionToken) is not honored. Harvesters should pull the full ListRecords in a single pass.

§ 2

JSON: every work

/api/works.json

One JSON document containing every work in the corpus, with title, slug, year, volume, category, URL, word count, and a link to the plain-text export. Stable shape: new works append, existing entries never change shape without a major-version bump.

curl https://filthylittleatheist.com/api/works.json | jq '.works[] | select(.year == 1872) | .title'

§ 3

JSON: per-work downloads

/api/downloads.json

For each of the 177 works, the byte sizes and URLs of every available export (TXT, JSON, BibTeX, RIS). Plus the per-volume bundles and the corpus-wide ZIP/MD bundles.

# Iterate every per-work TXT in one pass
curl -s https://filthylittleatheist.com/api/downloads.json \
  | jq -r '.works[].downloads.txt' \
  | xargs -I {} curl -s -O "https://filthylittleatheist.com{}"

§ 4

JSON: work revision history

/api/work-history.json · per-work feeds at /api/work-history/<slug>.json

Per-work Git revision feed: every commit that touched a given transcription, with date, summary, author, and short SHA. Pair with the human-readable page at /works/<slug>/history/ to pin a citation to a specific revision.

§ 5

JSON: concordance

/api/concordance.json · per-work feeds at /api/concordance/<slug>.json

A phrase-level entity index across the corpus: 17 named entities (Talmage, Beecher, Lincoln, Whitman, and the rest) cross-referenced to every work that mentions them.

§ 6

JSON: glossary

/api/glossary.json

Every glossary entry in machine-readable form: term, short definition, type (term / figure / event), aliases, slug.

§ 7

This day in Paine's life

/api/this-day.json (JSON Feed 1.1) · /this-day/feed.xml (Atom)

Every dated event from Paine's life, indexed by month-and-day so a daily-feed reader can serve "today in Paine" automatically.

§ 8

Atom feeds

§ 9

Sitemaps + robots

  • /sitemap.xml — XML sitemap, every page on the site, with <lastmod>, <changefreq>, <priority>.
  • /sitemap/ — human-readable sitemap.
  • /robots.txt — points crawlers at the sitemap, no disallows.

Chapter II

Recipes

Worked examples — what to actually run when you want to do the common scholarly things with this corpus.

§ 11

Harvest the corpus

Three options, in increasing order of effort.

A. The corpus ZIP (one file)

Single archive containing all 177 works × 4 formats (TXT / JSON / BibTeX / RIS), plus a single concatenated Markdown bundle and a README. About 9 MB.

curl -O https://filthylittleatheist.com/api/downloads/conway-complete.zip
unzip conway-complete.zip

B. OAI-PMH ListRecords (one round-trip)

Single Dublin Core dump of every work, suitable for DSpace, Omeka, or any OAI-aware repository.

curl -o records.xml https://filthylittleatheist.com/api/oai-pmh/list-records.xml

C. Source repository

Clone the public repo for the canonical Markdown source plus full Git history.

git clone https://github.com/jonajinga/filthy-little-atheist.git
cd the-great-agnostic/src/works
ls *.md   # one file per work, with YAML front-matter

§ 12

Cite a specific revision

For citations that need to pin to an exact text version (e.g. a footnote in a journal article), append the short commit hash from the work's history page:

Paine, Thomas "The Gods." The Works of Thomas Paine
(Conway Edition, 1900–1902). filthylittleatheist.com/works/the-gods/
[revision a1b2c3d, 2026-04-12].

The hash resolves at https://github.com/jonajinga/filthy-little-atheist/commit/<hash>. Per-work history is at /works/<slug>/history/.

§ 13

Import into Zotero / EndNote / Mendeley

Every work ships an RIS export at /works/<slug>/downloads/<slug>.ris. Drag the file into Zotero, or use Zotero's "Import…" menu.

# Bulk-import every RIS file into a Zotero collection:
curl -s https://filthylittleatheist.com/api/downloads.json \
  | jq -r '.works[].downloads.ris' \
  | xargs -I {} curl -O "https://filthylittleatheist.com{}"

For BibTeX users, swap .ris for .bib. The corpus-wide BibTeX file lives inside conway-complete.zip at the root.

§ 14

Subscribe to "this day in Paine"

Two formats. Pick whichever your reader prefers.

The reader will surface the day's entry — a dated event from Paine's life — automatically.

§ 15

Embed full-text search

Pagefind's search index is at /pagefind/pagefind.js. Drop it into any page you want to search the corpus from:

<script type="module">
  const pagefind = await import('https://filthylittleatheist.com/pagefind/pagefind.js');
  const search = await pagefind.search('agnosticism');
  for (const r of search.results) {
    const data = await r.data();
    console.log(data.url, data.meta.title);
  }
</script>

The index is fully static; no third-party search service involved. Pagefind's API reference: pagefind.app/docs/api/.

§ 16

Stylometric / corpus-linguistic work

The corpus is plain text in /api/works.json with word counts pre-computed. For a stylometric analysis (authorship, period drift, vocabulary frequency), you can pull every body in one pass:

import json, urllib.request
manifest = json.load(urllib.request.urlopen(
  'https://filthylittleatheist.com/api/downloads.json'))
texts = []
for w in manifest['works']:
  url = 'https://filthylittleatheist.com' + w['downloads']['txt']
  texts.append((w['slug'], urllib.request.urlopen(url).read().decode()))
# texts is now a list of (slug, body) ready for sklearn / spaCy / nltk

Chapter III

Identifiers

How the archive names things — for citation, harvesting, and durability across host migrations.

§ 17

Per-work URLs

Each work ships at predictable paths:

  • /works/<slug>/ — reading view
  • /works/<slug>/history/ — revision history
  • /works/<slug>/downloads/<slug>.txt — plain text
  • /works/<slug>/downloads/<slug>.json — structured JSON
  • /works/<slug>/downloads/<slug>.bib — BibTeX
  • /works/<slug>/downloads/<slug>.ris — RIS for Zotero / EndNote / Mendeley

§ 18

Stable identifiers

Every work carries a stable identifier in the conway:vol-N:slug namespace — for example conway:vol-3:about-the-holy-bible. Emitted in JSON-LD identifier, in the OAI-PMH record, and in the BibTeX export. Identifiers survive URL restructuring, host migration, and forks of the archive. Slugs are frozen and never repurposed.

§ 19

Structured data (JSON-LD)

Every page emits a single JSON-LD @graph covering: WebSite, Organization, Person (Paine, with Wikidata Q47484, VIAF 39377920, LCNAF n79071358 identifiers), Book (the Conway Edition with twelve typed Volume parts), and a per-page WebPage entity. Per-page-type schemas (CreativeWork, Quotation, Event, Place, FAQPage, BreadcrumbList, CollectionPage, DataCatalog, LearningResource, ImageGallery, VideoGallery) link back via stable @id IRIs.

§ 20

Authority files

Chapter IV

Working with the data

Terms, etiquette, and how to file a bug or request a new endpoint.

§ 21

Terms & rate limits

None. Every endpoint is a static file. Fetch as fast as you'd like, cache as long as you'd like. Paine's text is public domain (Public Domain Mark 1.0); the editorial commentary is CC BY-SA 4.0; the site code is MIT. Full breakdown at /license/.

§ 22

User-Agent etiquette

Long-running harvests should use a descriptive User-Agent string with a contact link, so I can route an eventual bug back to you:

User-Agent: ProjectName/1.0 (+https://example.org/about)

§ 23

Caching + freshness

Every API file is built at deploy time and served from Cloudflare's CDN. Cache-Control: public, max-age=300 on the JSON endpoints — so subsequent fetches within five minutes return instantly from edge cache. The site rebuilds on every push to main; expect data to lag the canonical source by no more than a minute or two after a commit lands.

§ 24

Bug reports + new endpoints

Open an issue at github.com/jonajinga/filthy-little-atheist/issues or write to me. New endpoints land if there's a clear scholarly use case.

Link copied