62 lines
1.5 KiB
Markdown
62 lines
1.5 KiB
Markdown
# Anno 117: Pax Romana Documentation
|
|
|
|
Structured API-like documentation of all game elements in Anno 117: Pax Romana.
|
|
|
|
## Workflow
|
|
|
|
### 1. Scrape pages
|
|
|
|
```bash
|
|
# Scrape 1 page (default)
|
|
venv/bin/python python/scraper.py
|
|
|
|
# Scrape multiple pages
|
|
venv/bin/python python/scraper.py -n 10
|
|
|
|
# Scrape all remaining pages
|
|
venv/bin/python python/scraper.py -n 9999
|
|
```
|
|
|
|
This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
|
|
|
|
### 2. Process scraped data into docs/
|
|
|
|
```bash
|
|
# Process 1 file (default)
|
|
venv/bin/python python/process.py
|
|
|
|
# Process 5 files
|
|
venv/bin/python python/process.py -n 5
|
|
|
|
# Process all remaining files
|
|
venv/bin/python python/process.py -n 9999
|
|
|
|
# Process in parallel (e.g., 4 workers processing 10 files each)
|
|
for i in {1..4}; do venv/bin/python python/process.py -n 10 & done; wait
|
|
```
|
|
|
|
The script uses file locking to safely run in parallel. Each invocation:
|
|
1. Claims pending JSON files from `processed.md`
|
|
2. Calls Claude to parse them into the `docs/` folder structure
|
|
3. Marks them as completed
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
- `scraping.md` - URLs to scrape (checkboxes track progress)
|
|
- `processed.md` - JSON files pending/processed into docs/
|
|
- `scraped_data/` - Raw scraped JSON files
|
|
- `docs/` - Structured documentation (see CLAUDE.md for structure)
|
|
- `python/scraper.py` - Web scraper script
|
|
- `python/process.py` - Process one JSON file into docs/
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
anno.land pages
|
|
↓ (scraper.py)
|
|
scraped_data/*.json
|
|
↓ (process_one.py → Claude)
|
|
docs/**/*.md
|
|
``` |