adding data

This commit is contained in:
2025-12-30 14:50:25 +01:00
parent 30a2a13d20
commit df2b557585
4 changed files with 841 additions and 78 deletions

View File

@@ -1,7 +1,81 @@
# Anno 117: Pax Romana documentation
# Anno 117: Pax Romana Documentation
## Run scraper
Structured API-like documentation of all game elements in Anno 117: Pax Romana.
## Workflow
### 1. Scrape pages
```bash
source venv/bin/activate && python python/scraper.py
# Scrape 1 page (default)
venv/bin/python python/scraper.py
# Scrape multiple pages
venv/bin/python python/scraper.py -n 10
# Scrape all remaining pages
venv/bin/python python/scraper.py -n 9999
```
This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
### 2. Process scraped data into docs.md
Use this prompt with Claude:
---
**Prompt for Claude:**
```
Process the pending scraped JSON files into docs.md.
1. Read `processed.md` to find pending JSON files (marked with `- [ ]`)
2. For each pending file, read it from `scraped_data/`
3. Extract game entities (buildings, goods, production chains, etc.)
4. Translate German names to English:
- Use the game's official English names where known
- Common translations:
- Latium = Latium (Roman region)
- Albion = Albion (Celtic region)
- Liberti = Liberti (Tier 1 Roman)
- Plebejer = Plebeians (Tier 2 Roman)
- Equites = Equites (Tier 3 Roman)
- Patrizier = Patricians (Tier 4 Roman)
- Wanderer = Waders (Tier 1 Celtic)
- Schmiede = Smiths (Tier 2 Celtic)
- Älteste = Elders (Tier 3 Celtic)
- Mercatoren = Mercators (Tier 4 Celtic)
- Edelmänner = Nobles (Tier 5 Celtic)
- Building/Good names: translate to English equivalents
5. Format data according to the schemas defined in docs.md
6. Add new entities or update existing ones in docs.md
7. Mark the JSON file as processed in `processed.md` by changing `- [ ]` to `- [x]`
Focus on extracting:
- Buildings: name, category, region, build costs, maintenance, workforce, cycle time, inputs/outputs, area effects, requirements
- Goods: name, category, produced by, consumed by
- Production chains: steps, ratios, cycle times
Keep entries concise. Mark unknown values as "Unknown" rather than guessing.
```
---
## File Structure
- `scraping.md` - URLs to scrape (checkboxes track progress)
- `processed.md` - JSON files pending/processed into docs.md
- `scraped_data/` - Raw scraped JSON files
- `docs.md` - Final structured documentation
- `python/scraper.py` - Web scraper script
## Data Flow
```
anno.land pages
↓ (scraper.py)
scraped_data/*.json
↓ (Claude)
docs.md
```