Files
anno-117-docs/README.md
2025-12-30 15:44:54 +01:00

62 lines
1.5 KiB
Markdown

# Anno 117: Pax Romana Documentation
Structured API-like documentation of all game elements in Anno 117: Pax Romana.
## Workflow
### 1. Scrape pages
```bash
# Scrape 1 page (default)
venv/bin/python python/scraper.py
# Scrape multiple pages
venv/bin/python python/scraper.py -n 10
# Scrape all remaining pages
venv/bin/python python/scraper.py -n 9999
```
This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
### 2. Process scraped data into docs/
```bash
# Process 1 file (default)
venv/bin/python python/process.py
# Process 5 files
venv/bin/python python/process.py -n 5
# Process all remaining files
venv/bin/python python/process.py -n 9999
# Process in parallel (e.g., 4 workers processing 10 files each)
for i in {1..4}; do venv/bin/python python/process.py -n 10 & done; wait
```
The script uses file locking to safely run in parallel. Each invocation:
1. Claims pending JSON files from `processed.md`
2. Calls Claude to parse them into the `docs/` folder structure
3. Marks them as completed
---
## File Structure
- `scraping.md` - URLs to scrape (checkboxes track progress)
- `processed.md` - JSON files pending/processed into docs/
- `scraped_data/` - Raw scraped JSON files
- `docs/` - Structured documentation (see CLAUDE.md for structure)
- `python/scraper.py` - Web scraper script
- `python/process.py` - Process one JSON file into docs/
## Data Flow
```
anno.land pages
↓ (scraper.py)
scraped_data/*.json
↓ (process_one.py → Claude)
docs/**/*.md
```