update
This commit is contained in:
23
README.md
23
README.md
@@ -22,24 +22,23 @@ This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, a
|
||||
### 2. Process scraped data into docs/
|
||||
|
||||
```bash
|
||||
# Process one file (can run multiple in parallel)
|
||||
# Process 1 file (default)
|
||||
venv/bin/python python/process.py
|
||||
|
||||
# Process multiple in parallel (e.g., 4 at once)
|
||||
for i in {1..4}; do venv/bin/python python/process.py & done; wait
|
||||
# Process 5 files
|
||||
venv/bin/python python/process.py -n 5
|
||||
|
||||
# Process all remaining files (4 parallel workers)
|
||||
while venv/bin/python python/process.py; do :; done &
|
||||
while venv/bin/python python/process.py; do :; done &
|
||||
while venv/bin/python python/process.py; do :; done &
|
||||
while venv/bin/python python/process.py; do :; done &
|
||||
wait
|
||||
# Process all remaining files
|
||||
venv/bin/python python/process.py -n 9999
|
||||
|
||||
# Process in parallel (e.g., 4 workers processing 10 files each)
|
||||
for i in {1..4}; do venv/bin/python python/process.py -n 10 & done; wait
|
||||
```
|
||||
|
||||
The script uses file locking to safely run in parallel. Each invocation:
|
||||
1. Claims one pending JSON file from `processed.md`
|
||||
2. Calls Claude to parse it into the `docs/` folder structure
|
||||
3. Marks it as completed
|
||||
1. Claims pending JSON files from `processed.md`
|
||||
2. Calls Claude to parse them into the `docs/` folder structure
|
||||
3. Marks them as completed
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user