update
This commit is contained in:
41
README.md
41
README.md
@@ -21,24 +21,36 @@ This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, a
|
||||
|
||||
### 2. Process scraped data into docs/
|
||||
|
||||
```bash
|
||||
# Process 1 file (default)
|
||||
venv/bin/python python/process.py
|
||||
Use this prompt with Claude Code to process pending JSON files into documentation:
|
||||
|
||||
# Process 5 files
|
||||
venv/bin/python python/process.py -n 5
|
||||
```
|
||||
Process 20 pending JSON files from processed.md into docs/.
|
||||
|
||||
# Process all remaining files
|
||||
venv/bin/python python/process.py -n 9999
|
||||
OUTER LOOP (repeat until X files total are done):
|
||||
1. Read processed.md, find next 5 pending files (marked `- [ ]`)
|
||||
2. Mark each in-progress: `- [ ]` to `- [~]`
|
||||
|
||||
# Process in parallel (e.g., 4 workers processing 10 files each)
|
||||
for i in {1..4}; do venv/bin/python python/process.py -n 10 & done; wait
|
||||
INNER LOOP (process batch of 5 in parallel):
|
||||
3. Spawn 5 Task sub-agents in ONE message (subagent_type="general-purpose", run_in_background=true), one per file:
|
||||
|
||||
"Process scraped_data/{filename} into docs/:
|
||||
- Read the JSON
|
||||
- Translate: Liberti, Plebejer=Plebeians, Equites, Patrizier=Patricians, Wanderer=Waders, Schmiede=Smiths, Älteste=Elders, Mercatoren=Mercators, Edelmänner=Nobles
|
||||
- Target: anno-117-buildings_* → docs/buildings/, anno-117-goods_* → docs/goods/, anno-117-specialists* → docs/specialists/, anno-117-skills_* → docs/skills/
|
||||
- Merge if exists, create if not (use existing docs/ as format examples)
|
||||
- Update category _index.md if needed
|
||||
- Mark done: `- [~] {filename}` to `- [x] {filename}` in processed.md"
|
||||
|
||||
4. Use TaskOutput to wait for all 5 agents to complete
|
||||
|
||||
END INNER LOOP
|
||||
|
||||
5. Count completed files, continue OUTER LOOP until X total done
|
||||
|
||||
END OUTER LOOP
|
||||
```
|
||||
|
||||
The script uses file locking to safely run in parallel. Each invocation:
|
||||
1. Claims pending JSON files from `processed.md`
|
||||
2. Calls Claude to parse them into the `docs/` folder structure
|
||||
3. Marks them as completed
|
||||
Replace X with desired total (e.g., 20).
|
||||
|
||||
---
|
||||
|
||||
@@ -49,7 +61,6 @@ The script uses file locking to safely run in parallel. Each invocation:
|
||||
- `scraped_data/` - Raw scraped JSON files
|
||||
- `docs/` - Structured documentation (see CLAUDE.md for structure)
|
||||
- `python/scraper.py` - Web scraper script
|
||||
- `python/process.py` - Process one JSON file into docs/
|
||||
|
||||
## Data Flow
|
||||
|
||||
@@ -57,6 +68,6 @@ The script uses file locking to safely run in parallel. Each invocation:
|
||||
anno.land pages
|
||||
↓ (scraper.py)
|
||||
scraped_data/*.json
|
||||
↓ (process_one.py → Claude)
|
||||
↓ (Claude sub-agents)
|
||||
docs/**/*.md
|
||||
```
|
||||
Reference in New Issue
Block a user