This commit is contained in:
2025-12-30 16:19:50 +01:00
parent f3a1108f9e
commit bdfff54f82
115 changed files with 123028 additions and 126 deletions

View File

@@ -21,24 +21,36 @@ This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, a
### 2. Process scraped data into docs/
```bash
# Process 1 file (default)
venv/bin/python python/process.py
Use this prompt with Claude Code to process pending JSON files into documentation:
# Process 5 files
venv/bin/python python/process.py -n 5
```
Process 20 pending JSON files from processed.md into docs/.
# Process all remaining files
venv/bin/python python/process.py -n 9999
OUTER LOOP (repeat until X files total are done):
1. Read processed.md, find next 5 pending files (marked `- [ ]`)
2. Mark each in-progress: `- [ ]` to `- [~]`
# Process in parallel (e.g., 4 workers processing 10 files each)
for i in {1..4}; do venv/bin/python python/process.py -n 10 & done; wait
INNER LOOP (process batch of 5 in parallel):
3. Spawn 5 Task sub-agents in ONE message (subagent_type="general-purpose", run_in_background=true), one per file:
"Process scraped_data/{filename} into docs/:
- Read the JSON
- Translate: Liberti, Plebejer=Plebeians, Equites, Patrizier=Patricians, Wanderer=Waders, Schmiede=Smiths, Älteste=Elders, Mercatoren=Mercators, Edelmänner=Nobles
- Target: anno-117-buildings_* → docs/buildings/, anno-117-goods_* → docs/goods/, anno-117-specialists* → docs/specialists/, anno-117-skills_* → docs/skills/
- Merge if exists, create if not (use existing docs/ as format examples)
- Update category _index.md if needed
- Mark done: `- [~] {filename}` to `- [x] {filename}` in processed.md"
4. Use TaskOutput to wait for all 5 agents to complete
END INNER LOOP
5. Count completed files, continue OUTER LOOP until X total done
END OUTER LOOP
```
The script uses file locking to safely run in parallel. Each invocation:
1. Claims pending JSON files from `processed.md`
2. Calls Claude to parse them into the `docs/` folder structure
3. Marks them as completed
Replace X with desired total (e.g., 20).
---
@@ -49,7 +61,6 @@ The script uses file locking to safely run in parallel. Each invocation:
- `scraped_data/` - Raw scraped JSON files
- `docs/` - Structured documentation (see CLAUDE.md for structure)
- `python/scraper.py` - Web scraper script
- `python/process.py` - Process one JSON file into docs/
## Data Flow
@@ -57,6 +68,6 @@ The script uses file locking to safely run in parallel. Each invocation:
anno.land pages
↓ (scraper.py)
scraped_data/*.json
↓ (process_one.py → Claude)
↓ (Claude sub-agents)
docs/**/*.md
```