# Scrape 1 page (default)
venv/bin/python python/scraper.py

# Scrape multiple pages
venv/bin/python python/scraper.py -n 10

# Scrape all remaining pages
venv/bin/python python/scraper.py -n 9999

This scrapes unchecked URLs from scraping.md, saves JSON to scraped_data/, and adds them to processed.md as pending.

2. Process scraped data into docs.md

Use this prompt with Claude:

Prompt for Claude:

Process pending scraped JSON files into docs.md ONE FILE AT A TIME.

LOOP: Repeat these steps until all files are processed:

1. Read `processed.md` to find the FIRST pending JSON file (marked with `- [ ]`)
2. If no pending files remain, stop
3. Read ONLY that one file from `scraped_data/`
4. Extract game entities (buildings, goods, production chains, etc.)
5. Translate German names to English:
   - Use the game's official English names where known
   - Common translations:
     - Latium = Latium (Roman region)
     - Albion = Albion (Celtic region)
     - Liberti = Liberti (Tier 1 Roman)
     - Plebejer = Plebeians (Tier 2 Roman)
     - Equites = Equites (Tier 3 Roman)
     - Patrizier = Patricians (Tier 4 Roman)
     - Wanderer = Waders (Tier 1 Celtic)
     - Schmiede = Smiths (Tier 2 Celtic)
     - Älteste = Elders (Tier 3 Celtic)
     - Mercatoren = Mercators (Tier 4 Celtic)
     - Edelmänner = Nobles (Tier 5 Celtic)
     - Building/Good names: translate to English equivalents
6. Format data according to the schemas defined in docs.md
7. Add new entities or update existing ones in docs.md
8. Mark this JSON file as processed in `processed.md` (change `- [ ]` to `- [x]`)
9. REPEAT from step 1

Focus on extracting:
- Buildings: name, category, region, build costs, maintenance, workforce, cycle time, inputs/outputs, area effects, requirements
- Goods: name, category, produced by, consumed by
- Production chains: steps, ratios, cycle times

Keep entries concise. Mark unknown values as "Unknown" rather than guessing.

File Structure

scraping.md - URLs to scrape (checkboxes track progress)
processed.md - JSON files pending/processed into docs.md
scraped_data/ - Raw scraped JSON files
docs.md - Final structured documentation
python/scraper.py - Web scraper script

Data Flow

anno.land pages
      ↓ (scraper.py)
scraped_data/*.json
      ↓ (Claude)
docs.md