63 lines
1.7 KiB
Markdown
63 lines
1.7 KiB
Markdown
# Anno 117: Pax Romana Documentation
|
|
|
|
Structured API-like documentation of all game elements in Anno 117: Pax Romana.
|
|
|
|
## Workflow
|
|
|
|
### 1. Scrape pages
|
|
|
|
```bash
|
|
# Scrape 1 page (default)
|
|
venv/bin/python python/scraper.py
|
|
|
|
# Scrape multiple pages
|
|
venv/bin/python python/scraper.py -n 10
|
|
|
|
# Scrape all remaining pages
|
|
venv/bin/python python/scraper.py -n 9999
|
|
```
|
|
|
|
This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
|
|
|
|
### 2. Process scraped data into docs/
|
|
|
|
```bash
|
|
# Process one file (can run multiple in parallel)
|
|
venv/bin/python python/process.py
|
|
|
|
# Process multiple in parallel (e.g., 4 at once)
|
|
for i in {1..4}; do venv/bin/python python/process.py & done; wait
|
|
|
|
# Process all remaining files (4 parallel workers)
|
|
while venv/bin/python python/process.py; do :; done &
|
|
while venv/bin/python python/process.py; do :; done &
|
|
while venv/bin/python python/process.py; do :; done &
|
|
while venv/bin/python python/process.py; do :; done &
|
|
wait
|
|
```
|
|
|
|
The script uses file locking to safely run in parallel. Each invocation:
|
|
1. Claims one pending JSON file from `processed.md`
|
|
2. Calls Claude to parse it into the `docs/` folder structure
|
|
3. Marks it as completed
|
|
|
|
---
|
|
|
|
## File Structure
|
|
|
|
- `scraping.md` - URLs to scrape (checkboxes track progress)
|
|
- `processed.md` - JSON files pending/processed into docs/
|
|
- `scraped_data/` - Raw scraped JSON files
|
|
- `docs/` - Structured documentation (see CLAUDE.md for structure)
|
|
- `python/scraper.py` - Web scraper script
|
|
- `python/process.py` - Process one JSON file into docs/
|
|
|
|
## Data Flow
|
|
|
|
```
|
|
anno.land pages
|
|
↓ (scraper.py)
|
|
scraped_data/*.json
|
|
↓ (process_one.py → Claude)
|
|
docs/**/*.md
|
|
``` |