This commit is contained in:
2025-12-30 18:22:26 +01:00
parent adb6f6ca2d
commit cd12f64a60
25 changed files with 1204 additions and 40 deletions

View File

@@ -6,18 +6,28 @@ Structured API-like documentation of all game elements in Anno 117: Pax Romana.
### 1. Scrape pages
**From anno.land:**
```bash
# Scrape 1 page (default)
venv/bin/python python/scraper.py
venv/bin/python python/scraper_anno_world.py
# Scrape multiple pages
venv/bin/python python/scraper.py -n 10
venv/bin/python python/scraper_anno_world.py -n 10
# Scrape all remaining pages
venv/bin/python python/scraper.py -n 9999
venv/bin/python python/scraper_anno_world.py -n 9999
```
This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
**From IGN wiki:**
```bash
# Scrape 1 page (default)
venv/bin/python python/scraper_ign.py
# Scrape multiple pages
venv/bin/python python/scraper_ign.py -n 10
```
This scrapes unchecked URLs from `scraping.md` (anno.land) or `scraping_ign.md` (IGN), saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
### 2. Process scraped data into docs/
@@ -56,18 +66,26 @@ Replace X with desired total (e.g., 20).
## File Structure
- `scraping.md` - URLs to scrape (checkboxes track progress)
- `scraping.md` - anno.land URLs to scrape (checkboxes track progress)
- `scraping_ign.md` - IGN wiki URLs to scrape (checkboxes track progress)
- `processed.md` - JSON files pending/processed into docs/
- `scraped_data/` - Raw scraped JSON files
- `docs/` - Structured documentation (see CLAUDE.md for structure)
- `python/scraper.py` - Web scraper script
- `python/scraper_anno_world.py` - anno.land web scraper
- `python/scraper_ign.py` - IGN wiki web scraper
## Data Flow
```
anno.land pages
(scraper.py)
scraped_data/*.json
(Claude sub-agents)
docs/**/*.md
anno.land pages IGN wiki pages
(scraper_anno_world.py) (scraper_ign.py)
└──────────┬─────────────┘
scraped_data/*.json
(Claude sub-agents)
docs/**/*.md
```