adding data

2025-12-30 14:50:25 +01:00
parent 30a2a13d20
commit df2b557585
4 changed files with 841 additions and 78 deletions
--- a/README.md
+++ b/README.md
@@ -1,7 +1,81 @@
-# Anno 117: Pax Romana documentation
+# Anno 117: Pax Romana Documentation

-## Run scraper
+Structured API-like documentation of all game elements in Anno 117: Pax Romana.
+
+## Workflow
+
+### 1. Scrape pages

 ```bash
-source venv/bin/activate && python python/scraper.py
+# Scrape 1 page (default)
+venv/bin/python python/scraper.py
+
+# Scrape multiple pages
+venv/bin/python python/scraper.py -n 10
+
+# Scrape all remaining pages
+venv/bin/python python/scraper.py -n 9999
+```
+
+This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
+
+### 2. Process scraped data into docs.md
+
+Use this prompt with Claude:
+
+---
+
+**Prompt for Claude:**
+
+```
+Process the pending scraped JSON files into docs.md.
+
+1. Read `processed.md` to find pending JSON files (marked with `- [ ]`)
+2. For each pending file, read it from `scraped_data/`
+3. Extract game entities (buildings, goods, production chains, etc.)
+4. Translate German names to English:
+   - Use the game's official English names where known
+   - Common translations:
+     - Latium = Latium (Roman region)
+     - Albion = Albion (Celtic region)
+     - Liberti = Liberti (Tier 1 Roman)
+     - Plebejer = Plebeians (Tier 2 Roman)
+     - Equites = Equites (Tier 3 Roman)
+     - Patrizier = Patricians (Tier 4 Roman)
+     - Wanderer = Waders (Tier 1 Celtic)
+     - Schmiede = Smiths (Tier 2 Celtic)
+     - Älteste = Elders (Tier 3 Celtic)
+     - Mercatoren = Mercators (Tier 4 Celtic)
+     - Edelmänner = Nobles (Tier 5 Celtic)
+     - Building/Good names: translate to English equivalents
+5. Format data according to the schemas defined in docs.md
+6. Add new entities or update existing ones in docs.md
+7. Mark the JSON file as processed in `processed.md` by changing `- [ ]` to `- [x]`
+
+Focus on extracting:
+- Buildings: name, category, region, build costs, maintenance, workforce, cycle time, inputs/outputs, area effects, requirements
+- Goods: name, category, produced by, consumed by
+- Production chains: steps, ratios, cycle times
+
+Keep entries concise. Mark unknown values as "Unknown" rather than guessing.
+```
+
+---
+
+## File Structure
+
+- `scraping.md` - URLs to scrape (checkboxes track progress)
+- `processed.md` - JSON files pending/processed into docs.md
+- `scraped_data/` - Raw scraped JSON files
+- `docs.md` - Final structured documentation
+- `python/scraper.py` - Web scraper script
+
+## Data Flow
+
+```
+anno.land pages
+      ↓ (scraper.py)
+scraped_data/*.json
+      ↓ (Claude)
+docs.md
 ```