This commit is contained in:
2025-12-30 18:22:26 +01:00
parent adb6f6ca2d
commit cd12f64a60
25 changed files with 1204 additions and 40 deletions

View File

@@ -7,7 +7,11 @@
"Bash(source:*)",
"Bash(pip install:*)",
"Bash(python python/scraper.py:*)",
"Bash(chmod:*)"
"Bash(chmod:*)",
"Bash(ls:*)",
"Bash(echo ---FILE:$f---)",
"Bash(done)",
"WebFetch(domain:www.ign.com)"
]
}
}

View File

@@ -6,18 +6,28 @@ Structured API-like documentation of all game elements in Anno 117: Pax Romana.
### 1. Scrape pages
**From anno.land:**
```bash
# Scrape 1 page (default)
venv/bin/python python/scraper.py
venv/bin/python python/scraper_anno_world.py
# Scrape multiple pages
venv/bin/python python/scraper.py -n 10
venv/bin/python python/scraper_anno_world.py -n 10
# Scrape all remaining pages
venv/bin/python python/scraper.py -n 9999
venv/bin/python python/scraper_anno_world.py -n 9999
```
This scrapes unchecked URLs from `scraping.md`, saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
**From IGN wiki:**
```bash
# Scrape 1 page (default)
venv/bin/python python/scraper_ign.py
# Scrape multiple pages
venv/bin/python python/scraper_ign.py -n 10
```
This scrapes unchecked URLs from `scraping.md` (anno.land) or `scraping_ign.md` (IGN), saves JSON to `scraped_data/`, and adds them to `processed.md` as pending.
### 2. Process scraped data into docs/
@@ -56,18 +66,26 @@ Replace X with desired total (e.g., 20).
## File Structure
- `scraping.md` - URLs to scrape (checkboxes track progress)
- `scraping.md` - anno.land URLs to scrape (checkboxes track progress)
- `scraping_ign.md` - IGN wiki URLs to scrape (checkboxes track progress)
- `processed.md` - JSON files pending/processed into docs/
- `scraped_data/` - Raw scraped JSON files
- `docs/` - Structured documentation (see CLAUDE.md for structure)
- `python/scraper.py` - Web scraper script
- `python/scraper_anno_world.py` - anno.land web scraper
- `python/scraper_ign.py` - IGN wiki web scraper
## Data Flow
```
anno.land pages
(scraper.py)
scraped_data/*.json
(Claude sub-agents)
docs/**/*.md
anno.land pages IGN wiki pages
(scraper_anno_world.py) (scraper_ign.py)
└──────────┬─────────────┘
scraped_data/*.json
(Claude sub-agents)
docs/**/*.md
```

View File

@@ -0,0 +1,43 @@
# Fashionist
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Common |
| Category | Civic |
| Slot Type | Specialist |
| Scope | Residences in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Prestige from Tunics | Residence | +1 | Radius-based (if supplied) |
## Improves Buildings
- Residence (Albion)
- Residence (Latium)
## Related Goods
- Tunics
## Source
Latium traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium to boost Prestige from Tunics by +1 for all residences within the building's area of influence. Only effective when residences are supplied with Tunics.
## Common Pitfalls
Effect is conditional on Tunics supply; residences not receiving Tunics will not benefit from the Prestige bonus. Ensure Tunics production chain is established before relying on this specialist.

View File

@@ -0,0 +1,43 @@
# Laconic Swordsmith
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Military |
| Slot Type | Specialist |
| Scope | Weapons Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Weapons Chain | +20% | Radius-based |
## Improves Buildings
- Iron Mine (Albion)
- Iron Mine (Latium)
- Furnace (Albion)
- Furnace (Latium)
- Weaponsmith (Albion)
- Weaponsmith (Latium)
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium that covers Iron Mines, Furnaces, and Weaponsmiths to boost the entire weapons production chain productivity by 20%. Ideal for military-focused layouts where weapons production is a priority.
## Common Pitfalls
Only affects buildings within the radius of the building where the specialist is placed. Ensure all parts of the weapons chain (Iron Mine, Furnace, Weaponsmith) are within range to maximize the benefit.

View File

@@ -0,0 +1,49 @@
# Lamellar Armour Master
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Military |
| Slot Type | Specialist |
| Scope | Armour Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Armour Chain | +20% | Radius-based |
## Improves Buildings
- Iron Mine (Albion)
- Iron Mine (Latium)
- Tannery (Albion)
- Tannery (Latium)
- Woodcutter (Albion)
- Furnace (Albion)
- Furnace (Latium)
- Armourer (Albion)
- Armourer (Latium)
- Salt Ponds (Latium)
- Pig Farm (Albion)
- Pig Farm (Latium)
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium to boost productivity of all buildings in the armour production chain by 20%. Affects the entire supply chain from raw materials (iron ore, leather, wood, salt, pigs) through to final armour production.
## Common Pitfalls
Only affects buildings within the radius of the building where the specialist is placed. Since it targets the full armour chain, optimal placement should cover as many chain buildings as possible.

View File

@@ -0,0 +1,40 @@
# Lygos Exakion, Architectus Eparchus
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Epic |
| Category | Military |
| Slot Type | Specialist |
| Scope | Gates in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Ranged Offence | Gates | +1 | Radius-based |
| Attack Speed | Gates | +25% | Radius-based |
| Hitpoints | Gates | +40% | Radius-based |
## Improves Buildings
- Gates
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Slot into a Governor Villa or Officium to significantly enhance gate defensive capabilities within range. The combination of increased ranged offence, faster attack speed, and substantially higher hitpoints makes gates much more formidable defensive structures.
## Common Pitfalls
Only affects gates within the area of influence; ensure gates are positioned within range of the Governor Villa or Officium containing this specialist.

View File

@@ -0,0 +1,41 @@
# Marix, Who Knows The Merrows
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Epic |
| Category | Research |
| Slot Type | Specialist |
| Scope | Mirrors Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Mirrors Chain | +35% | Radius-based |
## Improves Buildings
- Shell Gatherer (Albion)
- Narcissium (Albion)
- Silver Mine (Albion)
- Silver Forge (Albion)
## Source
Corvinus
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium covering Mirrors production chain buildings in Albion to boost their productivity by +35%. The effect applies to all buildings in the Mirrors production chain within range.
## Common Pitfalls
Only affects Mirrors chain buildings; ensure the specialist is placed in a building whose area of influence covers the target production facilities.

View File

@@ -0,0 +1,44 @@
# Maxima Cottius, Upholsterer of Atlas
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Epic |
| Category | Culture |
| Slot Type | Specialist |
| Scope | Loungers Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Loungers Chain | +35% | Radius-based |
## Improves Buildings
- Dye Works (Latium)
- Chair Maker (Latium)
- Cushion Stuffer (Latium)
- Sandarac Nursery (Latium)
- Sheep Farm (Albion)
- Sheep Farm (Latium)
- Snailery (Latium)
## Source
Latium Traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium to boost productivity of all Loungers production chain buildings within the area of influence by +35%. This specialist affects the entire chain from raw materials (Sheep Farms, Snaileries, Sandarac Nursery) through processing (Dye Works, Cushion Stuffer) to finished products (Chair Maker).
## Common Pitfalls
The +35% productivity bonus only applies to buildings within the specialist's area of influence. Ensure all relevant Loungers chain buildings are positioned within range of the Governor Villa or Officium where this specialist is placed. Buildings outside the radius receive no benefit.

View File

@@ -0,0 +1,41 @@
# Mellarius
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Common |
| Category | Research |
| Slot Type | Specialist |
| Scope | Wax Tablets Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Wax Tablets Chain | +10% | Radius-based |
## Improves Buildings
- Apiary (Albion)
- Apiary (Latium)
- Sandarac Nursery (Latium)
- Tabulus (Latium)
## Source
Latium traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium that covers your Wax Tablets production chain buildings. All Apiaries, Sandarac Nurseries, and Tabulus buildings within range receive a +10% productivity bonus.
## Common Pitfalls
Only affects buildings in the Wax Tablets production chain; buildings outside the area of influence receive no benefit. Position carefully to maximize coverage of all chain buildings.

View File

@@ -0,0 +1,38 @@
# Mercurial Marketeer
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Economic |
| Slot Type | Specialist |
| Scope | Mercator Residence in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Workforce from residents | Mercator Residence | +15% | Radius-based |
## Improves Buildings
- Residence (Albion)
## Source
Albion traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium covering Mercator residences to boost workforce output by +15% per residence in range. Ideal for maximizing workforce from high-tier Albion population.
## Common Pitfalls
Only affects Mercator residences; placing near lower-tier residences (Waders, Smiths, Elders) provides no benefit. Ensure the building's radius covers Mercator housing districts.

View File

@@ -0,0 +1,42 @@
# Nero Vulius Fama, Pastoralist Pastor
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Epic |
| Category | Economic |
| Slot Type | Specialist |
| Scope | Snailery, Sheep Farm, Pig Farm in Latium in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Workforce needed | Snailery, Sheep Farm, Pig Farm | -35% | Radius-based |
| Upkeep cost | Snailery, Sheep Farm, Pig Farm | -80% | Radius-based |
| Change of workforce type | Snailery, Sheep Farm, Pig Farm | Equites instead of Libertus | Radius-based |
## Improves Buildings
- Sheep Farm (Latium)
- Snailery (Latium)
- Pig Farm (Latium)
## Source
Latium traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium covering animal husbandry operations in Latium. The combination of -35% workforce and -80% upkeep makes this specialist extremely cost-effective for livestock farms. The workforce change from Libertus to Equites can help balance population tier requirements.
## Common Pitfalls
Only affects buildings in Latium; ensure your Sheep Farms, Snaileries, and Pig Farms are within the radius. The workforce type change to Equites may require you to have sufficient Equites population available.

View File

@@ -0,0 +1,38 @@
# Netter (Netzangler)
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Common |
| Category | Economic |
| Slot Type | Specialist |
| Scope | Sardines Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Sardines Chain | +10% | Radius-based |
## Improves Buildings
- Fishing Hut (Latium)
## Source
Latium Traders
## Stack Rules
Unknown
## Usage Patterns
Slot into a Governor Villa or Officium covering Sardines Chain buildings to boost productivity by 10%.
## Common Pitfalls
Only affects Sardines Chain buildings (Fishing Hut) within the area of influence.

View File

@@ -0,0 +1,45 @@
# Nomadic Herder
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Nature |
| Slot Type | Specialist |
| Scope | Livestock Farms in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Workforce needed | Livestock Farms | -33% | Radius-based |
| Upkeep cost | Livestock Farms | -66% | Radius-based |
## Improves Buildings
- Ox Farm (Albion)
- Horse Breeder (Albion)
- Horse Breeder (Latium)
- Sheep Farm (Albion)
- Sheep Farm (Latium)
- Pig Farm (Albion)
- Pig Farm (Latium)
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium covering livestock farms to significantly reduce both workforce requirements (-33%) and upkeep costs (-66%). This specialist is particularly valuable for large-scale animal husbandry operations spanning both Albion and Latium regions.
## Common Pitfalls
Only affects Livestock Farm category buildings (Ox Farm, Horse Breeders, Sheep Farms, Pig Farms) within the area of influence. Does not affect other agricultural buildings like crop farms or fishing operations.

View File

@@ -0,0 +1,39 @@
# Obtorquist
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Common |
| Category | Civic |
| Slot Type | Specialist |
| Scope | Torcs Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Torcs Chain buildings | +10% | Radius-based |
## Improves Buildings
- Wire-Twister (Albion)
- Copper Mine (Albion)
## Source
Albion traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium to boost productivity of Torcs Chain buildings (Wire-Twister, Copper Mine) by +10% within the area of influence.
## Common Pitfalls
Only affects Torcs Chain production buildings within range; ensure proper placement near Wire-Twister and Copper Mine clusters in Albion.

View File

@@ -0,0 +1,46 @@
# Optio Principalis Nico, Capturer of Motion (Optio Principalis Nico, Einfanger von Bewegungen)
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Legendary |
| Category | Military |
| Slot Type | Specialist |
| Scope | Towers and Stone Gates in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Attack Range | Towers and Stone Gates | +10% | Radius-based |
| Ranged Offence | Towers and Stone Gates | +1 | Radius-based |
| Attack Speed | Towers and Stone Gates | +35% | Radius-based |
| Hitpoints | Towers and Stone Gates | +25% | Radius-based |
## Boost Unlock
Generate at least 800 Happiness.
## Improves Buildings
- Towers
- Stone Gates
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Slot into a Governor Villa or Officium to buff all Towers and Stone Gates within its area of influence. The +35% attack speed combined with +10% attack range makes defensive structures significantly more effective at eliminating threats before they reach your walls. The +25% hitpoints bonus also makes these structures more durable.
## Common Pitfalls
Misplaced radius leads to zero value; only affects Towers and Stone Gates in range. Plan your defensive perimeter layout carefully to ensure the Governor Villa or Officium coverage overlaps with your key defensive structures. The legendary boost requires 800 Happiness, so ensure you have adequate population happiness before expecting full benefits.

View File

@@ -0,0 +1,45 @@
# Palaephaesta, Dextrous Demiurgus
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Epic |
| Category | Culture |
| Slot Type | Specialist |
| Scope | Artisanal Studios in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Workforce needed | Artisanal Studios | -40% | Radius-based |
| Upkeep cost | Artisanal Studios | -80% | Radius-based |
## Improves Buildings
- Beaver Hatter (Albion)
- Wire-Twister (Albion)
- Fibularium (Albion)
- Glassblower (Latium)
- Jeweller (Latium)
- Narcissium (Albion)
- Pileus Felter (Latium)
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium covering Artisanal Studios to significantly reduce workforce requirements by 40% and upkeep costs by 80%. This specialist is excellent for optimizing luxury goods production chains, particularly useful in late-game economies where upkeep costs become a major expense.
## Common Pitfalls
Only affects Artisanal Studios within the radius of the building where the specialist is placed. Buildings outside the area of influence will not receive the workforce and upkeep reductions.

View File

@@ -0,0 +1,38 @@
# Plebeian Praepositus
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Economic |
| Slot Type | Specialist |
| Scope | Plebeian Residences in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Workforce from residents | Plebeian Residence | +15% | Radius-based |
## Improves Buildings
- Residence (Latium)
## Source
Latium traders
## Stack Rules
Unknown
## Usage Patterns
Slot into a Governor Villa or Officium covering Plebeian residences to boost workforce output from those residences by +15%. Ideal for densely packed residential areas where you want to maximize workforce without expanding housing footprint.
## Common Pitfalls
Only affects Plebeian Residences specifically within the radius of effect; other residence types (Patrician, etc.) are not boosted. Misplaced radius leads to zero value.

View File

@@ -0,0 +1,39 @@
# Practicing Healer
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Common |
| Category | Nature |
| Slot Type | Specialist |
| Scope | Medici in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Available patrols | Medici | +1 | Radius-based |
## Improves Buildings
- Medici (Albion)
- Medici (Latium)
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Place in a villa or officium covering Medici buildings to increase available patrols by +1. This allows the Medici to serve more residences within its range.
## Common Pitfalls
Only affects Medici within the building's radius; misplaced coverage means no benefit. Ensure the specialist is slotted in a building whose area of influence covers the target Medici.

View File

@@ -0,0 +1,42 @@
# Pragmatic Aegis
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Civic |
| Slot Type | Specialist |
| Scope | Clan Shields Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Clan Shields Chain | +20% | Radius-based |
## Improves Buildings
- Bronze Smelter (Albion)
- Weld Crop (Albion)
- Copper Mine (Albion)
- Shieldbeater (Albion)
- Tin Mine (Albion)
## Source
Manx
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium to boost productivity of Clan Shields production chain buildings within range by +20%. Covers the entire chain from raw materials (Copper Mine, Tin Mine) through processing (Bronze Smelter, Weld Crop) to final production (Shieldbeater).
## Common Pitfalls
Only affects Albion region buildings; ensure all Clan Shields chain buildings are within the area of influence for maximum benefit.

View File

@@ -0,0 +1,40 @@
# Prefect of Oxen
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Economic |
| Slot Type | Specialist |
| Scope | Roast Beef Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Roast Beef Chain | +20% | Radius-based |
## Improves Buildings
- Earth Oven (Albion)
- Ox Farm (Albion)
- Saltwort Picker (Albion)
## Source
Valeria
## Stack Rules
Unknown
## Usage Patterns
Slot into a Governor Villa or Officium to boost productivity of Roast Beef production chain buildings by +20% within the area of influence. Ideal for Albion settlements with concentrated Roast Beef production.
## Common Pitfalls
Only affects buildings within the radius of influence; ensure all three building types (Earth Oven, Ox Farm, Saltwort Picker) are within range to maximize the productivity bonus across the entire production chain.

View File

@@ -0,0 +1,43 @@
# Prime Mantler
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Rare |
| Category | Nature |
| Slot Type | Specialist |
| Scope | Cloaks Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Cloaks Chain | +20% | Radius-based |
## Improves Buildings
- Birrus Stitcher (Albion)
- Weld Crop (Albion)
- Greenhands (Albion)
- Copper Mine (Albion)
- Sheep Farm (Albion)
- Sheep Farm (Latium)
## Source
All traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium to boost productivity of all Cloaks production chain buildings within range by +20%. Covers the entire chain from raw materials (Sheep Farm, Copper Mine) through processing (Greenhands, Weld Crop) to final production (Birrus Stitcher).
## Common Pitfalls
Only affects buildings within the area of influence; ensure all Cloaks chain buildings are clustered within range of the placement building. Does not affect buildings outside the chain even if they share similar resource types.

View File

@@ -0,0 +1,38 @@
# Shell Collector
**Entity Type:** Specialist
## Properties
| Field | Value |
|-------|-------|
| Rarity | Common |
| Category | Culture |
| Slot Type | Specialist |
| Scope | Cockles Chain in range |
## Effects
| Effect Type | Target | Value | Scope |
|------------|--------|-------|-------|
| Productivity | Cockles Chain | +10% | Radius-based |
## Improves Buildings
- Cockle Picker (Albion)
## Source
Albion traders
## Stack Rules
Unknown
## Usage Patterns
Place in a Governor Villa or Officium covering Cockle Picker buildings to boost their productivity by +10%. Best positioned to cover multiple Cockle Pickers to maximize the productivity bonus.
## Common Pitfalls
Only affects Cockle Picker buildings within the building's area of influence; ensure proper placement to cover your cockles production chain.

View File

@@ -103,35 +103,35 @@ When Claude processes a JSON file and adds its data to docs.md, mark it here.
- [x] en_anno-117-specialists_segler.json
- [x] en_anno-117-specialists_schweinefluesterer.json
- [x] en_anno-117-specialists_schlammsucherin.json
- [ ] en_anno-117-specialists_sceilg-eremitscher-reduktionist.json
- [x] en_anno-117-specialists_sceilg-eremitscher-reduktionist.json
- [x] en_anno-117-specialists_saeumer.json
- [x] en_anno-117-specialists_saturnsischer-schmelzer.json
- [ ] en_anno-117-specialists_rhea-proserpina-dis-mater.json
- [x] en_anno-117-specialists_rhea-proserpina-dis-mater.json
- [x] en_anno-117-specialists_renaturierer.json
- [ ] en_anno-117-specialists_rachsuechtige-villica.json
- [x] en_anno-117-specialists_rachsuechtige-villica.json
- [x] en_anno-117-specialists_publius-aelius-adrian-balustrischer-baumeister.json
- [x] en_anno-117-specialists_princeps-von-porphyr.json
- [ ] en_anno-117-specialists_primus-tremellius-valens-vom-tuermenden-elfenbein.json
- [ ] en_anno-117-specialists_praktizierender-heiler.json
- [ ] en_anno-117-specialists_pragmatische-aegis.json
- [ ] en_anno-117-specialists_praefekt-von-ochsen.json
- [ ] en_anno-117-specialists_plebejer-praepositus.json
- [ ] en_anno-117-specialists_palaephaesta-geschickte-damiurgus.json
- [ ] en_anno-117-specialists_optio-principalis-nico-einfaenger-von-bewegungen.json
- [ ] en_anno-117-specialists_obtorquist.json
- [ ] en_anno-117-specialists_nomadischer-hirte.json
- [ ] en_anno-117-specialists_netzangler.json
- [ ] en_anno-117-specialists_nero-vulius-fama-viehhalter-pastor.json
- [ ] en_anno-117-specialists_muschelsammlerin.json
- [ ] en_anno-117-specialists_munterer-markthaendler.json
- [ ] en_anno-117-specialists_modemacher.json
- [ ] en_anno-117-specialists_mellarius.json
- [ ] en_anno-117-specialists_maxima-cottius-polsterin-von-atlas.json
- [ ] en_anno-117-specialists_marix-die-die-meerleute-kennt.json
- [ ] en_anno-117-specialists_mantelprofi.json
- [ ] en_anno-117-specialists_lygos-exakion-architectus-eparchus.json
- [ ] en_anno-117-specialists_lamellen-ruestmeister.json
- [ ] en_anno-117-specialists_lakonische-schwertschmiedin.json
- [x] en_anno-117-specialists_primus-tremellius-valens-vom-tuermenden-elfenbein.json
- [x] en_anno-117-specialists_praktizierender-heiler.json
- [x] en_anno-117-specialists_pragmatische-aegis.json
- [x] en_anno-117-specialists_praefekt-von-ochsen.json
- [x] en_anno-117-specialists_plebejer-praepositus.json
- [x] en_anno-117-specialists_palaephaesta-geschickte-damiurgus.json
- [x] en_anno-117-specialists_optio-principalis-nico-einfaenger-von-bewegungen.json
- [x] en_anno-117-specialists_obtorquist.json
- [x] en_anno-117-specialists_nomadischer-hirte.json
- [x] en_anno-117-specialists_netzangler.json
- [x] en_anno-117-specialists_nero-vulius-fama-viehhalter-pastor.json
- [x] en_anno-117-specialists_muschelsammlerin.json
- [x] en_anno-117-specialists_munterer-markthaendler.json
- [x] en_anno-117-specialists_modemacher.json
- [x] en_anno-117-specialists_mellarius.json
- [x] en_anno-117-specialists_maxima-cottius-polsterin-von-atlas.json
- [x] en_anno-117-specialists_marix-die-die-meerleute-kennt.json
- [x] en_anno-117-specialists_mantelprofi.json
- [x] en_anno-117-specialists_lygos-exakion-architectus-eparchus.json
- [x] en_anno-117-specialists_lamellen-ruestmeister.json
- [x] en_anno-117-specialists_lakonische-schwertschmiedin.json
- [ ] en_anno-117-specialists_kunsthandwerklicher-schmelzprofi.json
- [ ] en_anno-117-specialists_konstruktivistin.json
- [ ] en_anno-117-specialists_kodextreuer-aufseher.json
@@ -987,17 +987,17 @@ When Claude processes a JSON file and adds its data to docs.md, mark it here.
- [x] anno-117-specialists_rhea-proserpina-dis-mater.json
- [x] anno-117-specialists_renaturierer.json
- [x] anno-117-specialists_rachsuechtige-villica.json
- [ ] anno-117-specialists_publius-aelius-adrian-balustrischer-baumeister.json
- [x] anno-117-specialists_publius-aelius-adrian-balustrischer-baumeister.json
- [ ] anno-117-specialists_princeps-von-porphyr.json
- [ ] anno-117-specialists_primus-tremellius-valens-vom-tuermenden-elfenbein.json
- [ ] anno-117-specialists_praktizierender-heiler.json
- [ ] anno-117-specialists_pragmatische-aegis.json
- [ ] anno-117-specialists_praefekt-von-ochsen.json
- [ ] anno-117-specialists_plebejer-praepositus.json
- [x] anno-117-specialists_plebejer-praepositus.json
- [ ] anno-117-specialists_palaephaesta-geschickte-damiurgus.json
- [ ] anno-117-specialists_optio-principalis-nico-einfaenger-von-bewegungen.json
- [ ] anno-117-specialists_obtorquist.json
- [ ] anno-117-specialists_nomadischer-hirte.json
- [x] anno-117-specialists_nomadischer-hirte.json
- [ ] anno-117-specialists_netzangler.json
- [ ] anno-117-specialists_nero-vulius-fama-viehhalter-pastor.json
- [ ] anno-117-specialists_muschelsammlerin.json
@@ -1008,7 +1008,7 @@ When Claude processes a JSON file and adds its data to docs.md, mark it here.
- [ ] anno-117-specialists_marix-die-die-meerleute-kennt.json
- [ ] anno-117-specialists_mantelprofi.json
- [ ] anno-117-specialists_lygos-exakion-architectus-eparchus.json
- [ ] anno-117-specialists_lamellen-ruestmeister.json
- [x] anno-117-specialists_lamellen-ruestmeister.json
- [ ] anno-117-specialists_lakonische-schwertschmiedin.json
- [ ] anno-117-specialists_kunsthandwerklicher-schmelzprofi.json
- [ ] anno-117-specialists_konstruktivistin.json

308
python/scraper_ign.py Executable file
View File

@@ -0,0 +1,308 @@
#!/usr/bin/env python3
"""
Anno 117 IGN Wiki Scraper
Scrapes pages from IGN's Anno 117 wiki, extracts game data,
discovers new pages, and updates the scraping list.
"""
import argparse
import re
import json
import time
from pathlib import Path
from urllib.parse import urljoin, urlparse
import requests
from bs4 import BeautifulSoup
# Project paths
PROJECT_ROOT = Path(__file__).parent.parent
SCRAPING_MD = PROJECT_ROOT / "scraping_ign.md"
PROCESSED_MD = PROJECT_ROOT / "processed.md"
OUTPUT_DIR = PROJECT_ROOT / "scraped_data"
# IGN wiki base URL
IGN_WIKI_BASE = "https://www.ign.com/wikis/anno-117-pax-romana"
def read_scraping_md() -> tuple[list[str], list[str], str]:
"""
Read scraping_ign.md and return (unchecked_urls, checked_urls, full_content).
Creates the file with initial URLs if it doesn't exist.
"""
if not SCRAPING_MD.exists():
# Create initial scraping_ign.md with the main wiki page
initial_content = f"""# IGN Anno 117 Scraping URLs
## Pages to Scrape
- [ ] {IGN_WIKI_BASE}
"""
SCRAPING_MD.write_text(initial_content, encoding="utf-8")
print(f"Created {SCRAPING_MD} with initial URL")
content = SCRAPING_MD.read_text(encoding="utf-8")
unchecked = re.findall(r"^- \[ \] (https?://[^\s]+)", content, re.MULTILINE)
checked = re.findall(r"^- \[x\] (https?://[^\s]+)", content, re.MULTILINE)
return unchecked, checked, content
def mark_url_as_done(url: str) -> None:
"""Mark a URL as scraped in scraping_ign.md."""
content = SCRAPING_MD.read_text(encoding="utf-8")
# Use regex to match the exact URL (followed by newline or end of string)
escaped_url = re.escape(url)
pattern = rf"^- \[ \] {escaped_url}$"
replacement = f"- [x] {url}"
content = re.sub(pattern, replacement, content, flags=re.MULTILINE)
SCRAPING_MD.write_text(content, encoding="utf-8")
print(f"Marked as done: {url}")
def add_new_urls(new_urls: list[str]) -> None:
"""Add newly discovered URLs to scraping_ign.md if not already present."""
unchecked, checked, content = read_scraping_md()
existing = set(unchecked + checked)
urls_to_add = [url for url in new_urls if url not in existing]
if urls_to_add:
# Append new URLs at the end
additions = "\n".join(f"- [ ] {url}" for url in urls_to_add)
content = content.rstrip() + "\n" + additions + "\n"
SCRAPING_MD.write_text(content, encoding="utf-8")
print(f"Added {len(urls_to_add)} new URLs to scraping_ign.md")
for url in urls_to_add:
print(f" + {url}")
def extract_links(soup: BeautifulSoup, base_url: str) -> list[str]:
"""Extract relevant IGN wiki links from the page."""
links = []
for a in soup.find_all("a", href=True):
href = a["href"]
full_url = urljoin(base_url, href)
parsed = urlparse(full_url)
# Only keep links from IGN's Anno 117 wiki
if parsed.netloc == "www.ign.com" and "/wikis/anno-117-pax-romana" in parsed.path:
# Clean the URL (remove fragments and query params for deduplication)
clean_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
# Remove trailing slash for consistency
clean_url = clean_url.rstrip("/")
if clean_url not in links and clean_url != IGN_WIKI_BASE.rstrip("/"):
links.append(clean_url)
return links
def scrape_page(url: str) -> dict:
"""
Scrape a single page and extract relevant content.
Returns a dict with the scraped data.
"""
print(f"Scraping: {url}")
headers = {
"User-Agent": "Anno117DocBot/1.0 (Personal use, one-time scrape for AI documentation; Contact: jivanrij@gmail.com)",
"X-Bot-Purpose": "Creating documentation for personal AI agent use (non-commercial). Each page is fetched only once.",
"From": "jivanrij@gmail.com",
}
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
# Extract page title
title = soup.find("title")
title_text = title.get_text(strip=True) if title else "Unknown"
# IGN wiki content is typically in article or wiki-specific containers
main_content = (
soup.find("div", class_="wiki-page-content")
or soup.find("article", class_="article")
or soup.find("div", class_="content-page")
or soup.find("main")
or soup.find("article")
or soup.find("div", class_="content")
)
# Extract all text content
if main_content:
text_content = main_content.get_text(separator="\n", strip=True)
else:
# Fallback to body
body = soup.find("body")
text_content = body.get_text(separator="\n", strip=True) if body else ""
# Extract tables (common for game data)
tables = []
for table in soup.find_all("table"):
table_data = []
for row in table.find_all("tr"):
cells = [cell.get_text(strip=True) for cell in row.find_all(["td", "th"])]
if cells:
table_data.append(cells)
if table_data:
tables.append(table_data)
# Extract images (for item/building icons)
images = []
for img in soup.find_all("img"):
src = img.get("src", "")
alt = img.get("alt", "")
if src and any(keyword in src.lower() for keyword in ["icon", "item", "building", "good", "anno"]):
images.append({"src": urljoin(url, src), "alt": alt})
# Extract discovered links
discovered_links = extract_links(soup, url)
# Extract all images (not just filtered ones)
all_images = []
for img in soup.find_all("img"):
src = img.get("src", "")
alt = img.get("alt", "")
if src:
all_images.append({"src": urljoin(url, src), "alt": alt})
return {
"url": url,
"source": "ign",
"title": title_text,
"text_content": text_content,
"tables": tables,
"images": images,
"all_images": all_images,
"discovered_links": discovered_links,
"full_html_length": len(response.text)
}
def save_scraped_data(data: dict, url: str) -> Path:
"""Save scraped data to a JSON file."""
OUTPUT_DIR.mkdir(exist_ok=True)
# Create filename from URL with ign prefix
parsed = urlparse(url)
path_parts = [p for p in parsed.path.split("/") if p]
# Remove 'wikis' from path parts if present
path_parts = [p for p in path_parts if p != "wikis"]
filename = "ign_" + "_".join(path_parts) if path_parts else "ign_index"
filename = re.sub(r"[^\w\-]", "_", filename)
output_file = OUTPUT_DIR / f"{filename}.json"
with open(output_file, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, ensure_ascii=False)
print(f"Saved to: {output_file}")
return output_file
def add_pending_file(json_filename: str) -> None:
"""Add a JSON file to the pending section of processed.md."""
content = PROCESSED_MD.read_text(encoding="utf-8")
# Check if file is already listed
if json_filename in content:
return
# Add to pending section
pending_marker = "## Pending Files"
if pending_marker in content:
content = content.replace(
pending_marker,
f"{pending_marker}\n- [ ] {json_filename}"
)
PROCESSED_MD.write_text(content, encoding="utf-8")
print(f"Added to processed.md pending: {json_filename}")
def scrape_one(url: str) -> bool:
"""Scrape a single URL. Returns True on success, False on failure."""
try:
# Scrape the page
data = scrape_page(url)
# Save the data
output_file = save_scraped_data(data, url)
# Track in processed.md as pending
add_pending_file(output_file.name)
# Print summary
print(f"\n--- Summary ---")
print(f"Title: {data['title']}")
print(f"Content length: {len(data['text_content'])} chars")
print(f"Tables found: {len(data['tables'])}")
print(f"Images found: {len(data['images'])}")
print(f"Links discovered: {len(data['discovered_links'])}")
# Add discovered links to scraping_ign.md
if data["discovered_links"]:
add_new_urls(data["discovered_links"])
# Mark this URL as done
mark_url_as_done(url)
print(f"\nSuccessfully scraped: {url}")
return True
except requests.RequestException as e:
print(f"Error scraping {url}: {e}")
return False
def main():
"""Main entry point."""
parser = argparse.ArgumentParser(description="Scrape Anno 117 pages from IGN wiki")
parser.add_argument(
"-n", "--count",
type=int,
default=1,
help="Number of URLs to scrape (default: 1)"
)
args = parser.parse_args()
unchecked, checked, _ = read_scraping_md()
if not unchecked:
print("No unchecked URLs found in scraping_ign.md")
return
print(f"Found {len(unchecked)} unchecked URLs")
print(f"Already scraped: {len(checked)} URLs")
print(f"Will scrape: {min(args.count, len(unchecked))} URLs\n")
scraped = 0
failed = 0
urls_to_scrape = unchecked[:args.count]
total = len(urls_to_scrape)
for i, url in enumerate(urls_to_scrape):
print(f"[{i + 1}/{total}] ", end="")
if scrape_one(url):
scraped += 1
else:
failed += 1
print()
# Be nice to the server - wait 3 seconds between requests
if i < total - 1:
print("Waiting 3 seconds...")
time.sleep(3)
print(f"Done! Scraped: {scraped}, Failed: {failed}")
if __name__ == "__main__":
main()