# World Spider Catalog Bot
> **Project:** Spider taxonomy search bot with Discord integration and web interface
> **Stack:** Python (Discord.py), TypeScript (React 19 + Bun/Hono), MongoDB, SQLite
> **Architecture:** Multi-service async system with bidirectional sync
---
## π Quick Reference
| Component | Technology | Entry Point | Port |
| ----------- | --------------- | -------------------------- | ----------- |
| Discord Bot | Python 3.11+ | `main.py:19` | 8080 (sync) |
| Web API | Bun + Hono | `web/server/src/server.ts` | 8000 |
| Web UI | React 19 + Vite | `web/client/src/App.tsx` | 3000 |
| Cache Layer | MongoDB 8 | - | 27017 |
| User Data | SQLite | `database/manager.py:108` | - |
---
## ποΈ System Architecture
### Data Flow Overview
```mermaid
sequenceDiagram
autonumber
actor User as User (Discord/Web)
participant CL as Commands Layer
participant CQ as Command Queue
participant SH as Search Handler
participant CSV as CSV in-memory lookup
participant AM as API Manager
participant API as External APIs
participant MDB as MongoDB Cache
participant EC as Embed Creator
User->>CL: Initiate Request
CL->>CQ: Add to Queue (rate limiting)
CQ->>SH: Process Request
SH->>CSV: Query Species Data
CSV-->>SH: Return Results
alt Data not in CSV
SH->>AM: Request External Data
AM->>API: Call WSC, iNat, GBIF, CITES
API-->>AM: Return Data
AM->>MDB: Cache Response (24h TTL)
else Data in Cache
AM->>MDB: Retrieve Cached Data
end
AM-->>SH: Return Enriched Data
SH->>EC: Format Data
EC-->>User: Receive Rich Data
```
### Service Dependencies
- **wsc-bot** (Python) β MongoDB (cache), SQLite (user data), External APIs
- **wsc-api** (Bun) β MongoDB (species index, cache)
- **wsc-client** (React) β wsc-api (REST), Better-Auth (OAuth)
---
## π Core Files & Key Line Numbers
### Bot Initialization
#### `main.py` - Entry Point
- **Line 19-30**: Main function - validates tokens, checks CSV
- **Line 31-33**: Discord intents configuration
- **Line 35-37**: Bot instantiation with `!` prefix
- **Line 39**: Command registration
- **Line 42**: Bot run with error handling
#### `core/bot.py` - WSCBot Class
- **Line 83-94**: WSCBot class definition & ownership check
- **Line 86-89**: Command queue & HTTP session initialization
- **Line 91-94**: Custom owner check (supports additional admin IDs)
##### Setup Hook (`setup_hook` - Line 96-147)
- **Line 100-107**: aiohttp session creation (50 conn pool, 20s timeout)
- **Line 108**: Database initialization
- **Line 110-113**: Sync server startup (port 8080)
- **Line 115-120**: MongoDB cache initialization
- **Line 122-125**: CSV data loading
- **Line 130**: Daily update task creation
- **Line 133**: Species of the Day task creation
- **Line 137**: Cache warming task
- **Line 143-147**: Slash command sync
##### Background Tasks
**Daily CSV Update** (`_daily_update_task` - Line 149-184)
- **Line 160**: Check if update needed today
- **Line 161-162**: Perform update
- **Line 164-170**: Reload CSV on success
- **Line 172**: Sync species index to MongoDB
- **Line 179**: Wait 24 hours between checks
**Species of the Day** (`_species_of_the_day_task` - Line 211-327)
- **Line 226-234**: Get configs due for posting at current hour
- **Line 242-258**: Resolve Discord channel
- **Line 261-263**: Select species for config
- **Line 266-270**: Create SOTD embed
- **Line 272-286**: Add favorite button view
- **Line 288-295**: Post to channel
- **Line 318-322**: Calculate sleep until next hour
**MongoDB Species Index Sync** (`_sync_species_index_if_needed` - Line 186-209)
- **Line 190**: Check if sync needed
- **Line 192**: Perform sync from CSV
- **Line 194-201**: Log statistics (upserts, deletes, aliases)
##### Message Handlers
**Temperature Conversion** (`on_message` - Line 408-470)
- **Line 416-425**: Pattern `!25c` β Celsius/Fahrenheit conversion
- **Line 428-432**: Multi-measurement handler `!m 2m x 3m`
- **Line 442-451**: Feet+inches pattern `!5'8"`
- **Line 462-469**: Single measurement `!10ft`
**Measurement Conversion** (`_handle_measurement_conversion` - Line 473-537)
- **Line 478-489**: Normalize input (Γ, *, ', ", ft+in combos)
- **Line 490**: Split by 'x' or spaces
- **Line 498**: Unit conversion table
- **Line 501-521**: Parse each measurement part
- **Line 524-528**: Format output for all units
##### Command Queue (`CommandQueue` class - Line 39-81)
- **Line 42-44**: Queue and active command tracking
- **Line 46-57**: `execute_command` - adds to guild queue
- **Line 59-80**: `_process_queue` - sequential execution with 0.1s delays
---
### Configuration & Environment
#### `core/config.py` - Configuration Management
- **Line 11-14**: Core API tokens (Discord, WSC, Species+, API Ninjas)
- **Line 17-21**: Web app base URL configuration
- **Line 24-26**: File paths (CSV, DB, logs)
- **Line 29-34**: MongoDB configuration
- **Line 36**: Log level from environment
- **Line 39-58**: Additional owner IDs parsing
- **Line 61-87**: Logging setup with fallbacks
- **Line 93-101**: Moltly sync configuration
- **Line 104**: Web API URL for outbound sync
---
### API Integration Layer
#### `api/manager.py` - External API Management
- **Line 1-18**: Imports & cache configuration
- **Line 27-32**: Weather cache TTL & versioning
- **Line 39-70**: Location code expansion (ISO 3166-1, ISO 3166-2, US states)
##### Key Functions (see file for full implementation)
**Location Normalization**
- **`_expand_location_code`** (Line 73-~): ISO code β full name
- Handles ISO 3166-1 (US, MX, ES)
- Handles ISO 3166-2 subdivisions (US-TX, MX-OAX)
- US state codes (TX β Texas, United States)
**Species Data APIs** (`APIManager` class)
- **`get_wsc_data`**: World Spider Catalog API lookup
- **`find_inaturalist_taxon`**: iNaturalist taxon ID search
- **`get_inat_photos_for_species`**: Photo retrieval
- **`get_inat_observations`**: Observation records with location
**GBIF Integration**
- **`get_gbif_data`**: Species occurrence data
- **`get_coordinates_from_code`**: Location code β coordinates
**Weather APIs**
- **`get_weather_data`**: Historical & current weather (30-year archive)
- **`get_weather_snapshot`**: Point-in-time weather
- Caching: 3-day soft TTL for recent, permanent for species-level
**CITES/Species+**
- **`get_speciesplus_cites_status`**: CITES appendix & legislation
---
### Data Management Layer
#### `utils/data.py` - In-Memory CSV Management
- **Line 41-55**: Global cache & indices
- `species_data_cache`: All CSV rows
- `_idx_by_lsid`: Quick LSID lookup
- `_idx_by_genus`: Species grouped by genus
- `_set_genera`, `_set_families`: Fast existence checks
- `_count_by_family`, `_count_by_family_genus`: Taxon counts
- `_canonical_species`: Normalized species names
- `species_aliases`: User-defined aliases
##### Alias Management
- **Line 54-63**: Alias file paths & fallbacks
- **Line 66-85**: `_resolve_alias_source` - copies seed file if needed
- **Line 88-~**: `load_species_aliases` - loads JSON aliases
##### DataManager Class Functions
- **`load_csv_data`**: Reads CSV, rebuilds all indices
- **`get_csv_header`**: Returns column names
- **`find_species_by_name`**: Exact or fuzzy match
- **`find_species_by_lsid`**: Direct LSID lookup
- **`find_species_by_genus`**: All species in genus
- **`find_genus_by_name`**: Genus-level lookup
- **`find_family_by_name`**: Family-level lookup
##### FuzzySearchManager Class
- **`search`**: Fuzzy matching via rapidfuzz or difflib fallback
- Returns sorted results with match scores
##### ConversionUtils Class
- **`celsius_to_fahrenheit`**, **`fahrenheit_to_celsius`**
- **`read_file_content`**: Reads article/help files
- **`list_articles`**: Enumerates available articles
---
### Database Layer
#### `database/manager.py` - SQLite Operations
- **Line 108**: `init_database` - creates tables & schema
##### Database Schema
**Favorites Table**
```sql
CREATE TABLE favorites (
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL,
species_canonical TEXT NOT NULL,
species_lsid TEXT,
added_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**Molt Logs Table**
```sql
CREATE TABLE molt_logs (
id INTEGER PRIMARY KEY,
user_id INTEGER NOT NULL,
species TEXT NOT NULL,
specimen_name TEXT,
date TIMESTAMP NOT NULL,
stage TEXT, -- 'pre', 'molt', 'post'
notes TEXT,
species_lsid TEXT,
added_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
**SOTD Configs Table**
```sql
CREATE TABLE sotd_configs (
config_id INTEGER PRIMARY KEY AUTOINCREMENT,
guild_id INTEGER NOT NULL,
channel_id INTEGER NOT NULL,
name TEXT,
priority_families TEXT, -- comma-separated
min_photos INTEGER DEFAULT 3,
history_days INTEGER DEFAULT 90,
posting_hour INTEGER DEFAULT 8,
enabled BOOLEAN DEFAULT 1
);
```
**Blacklist Table**
```sql
CREATE TABLE blacklist (
id INTEGER PRIMARY KEY,
entity_id INTEGER NOT NULL,
entity_type TEXT, -- 'user' or 'guild'
reason TEXT,
added_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
```
##### Key Methods
**User Collections**
- **`add_favorite_async`**: Add to favorites
- **`get_user_favorites_async`**: Retrieve favorites
- **`remove_favorite_async`**: Remove favorite
- **`get_wishlist_async`**: User wishlist
- **`add_molt_log_async`**: Log molt event
- **`get_molt_history_async`**: Molt records
- **`add_note_async`**: Personal note
- **`get_user_notes_async`**: Retrieve notes
**SOTD Management**
- **`get_sotd_configs_async`**: All configs for guild
- **`create_sotd_config_async`**: Create new config
- **`update_sotd_config_async`**: Update settings
- **`get_recently_featured_species_for_config_async`**: Featured history
**Security**
- **`add_to_blacklist_async`**: Blacklist user/guild
- **`is_blacklisted_async`**: Check blacklist
- **`remove_from_blacklist_async`**: Remove from blacklist
---
### MongoDB Cache Layer
#### `utils/cache.py` - API Response Caching
- **`APICache` class**: Persistent API response cache
##### Collections
**`api_cache`**
```json
{
"_id": ObjectId,
"key": "unique_cache_key",
"value": {...},
"cachedAt": ISODate,
"refreshAfter": ISODate, // null = never refresh
"lastCheckedAt": ISODate,
"value_hash": "sha256"
}
```
**`api_calls`** (permanent archive)
```json
{
"ts": ISODate,
"method": "GET",
"url": "https://...",
"urlHash": "sha256",
"request": {"headers": {}, "params": {}},
"response_status": 200,
"response": {"json": {}, "size": 1234},
"durationMs": 123,
"cacheKey": "...",
"source": "wsc_api"
}
```
##### Key Methods
- **`init`**: Initialize MongoDB connection
- **`get`**: Returns cached value or None if expired
- **`set`**: Stores value with optional TTL
- **`delete`**: Removes cache entry
- **`clear`**: Clears all cache
##### Soft TTL Semantics
- Data never automatically deleted
- `get()` returns None if `refreshAfter` is past
- `set()` only updates if value changed
- Permanent historical archive
---
### Species Indexing
#### `utils/species_indexer.py` - MongoDB Sync
- **`SpeciesIndexer` class**: CSV β MongoDB sync
##### Collections
**`species_index`**
```json
{
"_id": "lsid_or_canonical_lower",
"lsid": "urn:lsid:...",
"canonical": "Genus species",
"canonical_lower": "genus species",
"genus": "Genus",
"genus_lower": "genus",
"species": "species",
"species_lower": "species",
"family": "Family",
"family_lower": "family",
"author": "Author, Year",
"year": 2020,
"distribution": "Location1, Location2",
"aliases": ["alias1", "alias2"],
"tokens": ["genus", "species", "family"],
"search_blob": "searchable fulltext",
"synced_at": ISODate,
"row": {...} // full CSV row
}
```
**`species_alias_index`**
```json
{
"_id": "alias_lower",
"alias": "Common Name",
"alias_lower": "common name",
"canonical": "Genus species",
"lsid": "urn:lsid:...",
"synced_at": ISODate
}
```
##### Key Methods
- **`sync_from_csv`**: Bulk upserts CSV rows (500 batch size)
- Creates indexes on canonical_lower, genus_lower, family_lower, tokens
- Returns stats (upserts, deletions, alias counts)
- **`should_sync`**: Checks if stats indicate changes needed
- Cleanup: Deletes old entries not in current sync
---
### CSV Update System
#### `utils/csv_updater.py` - Daily Updates
- **`CSVUpdater` class**: Manages CSV updates & changelogs
##### Key Methods
**Update Management**
- **`get_latest_csv_url`**: Fetches URL from WSC (tries 7 days back)
- **`perform_update`**: Downloads, diffs, stores new CSV
- **`should_update_today`**: Checks if already attempted
- **`mark_update_attempted`**: Tracks update state
**Changelog Generation**
- **`generate_changelog`**: Diffs two CSV files
- Categories: added, removed, modified, synonymy_changes
- Human-readable summary
**Data Management**
- **`get_all_changelogs`**: Lists all changelogs
- **`get_changelog_by_id`**: Retrieves specific changelog
- **`get_changelog_range`**: Multi-changelog retrieval
##### File Structure
```
data/
βββ changelogs/ # JSON summaries
β βββ changelog_20250115.json
βββ diffs/ # Detailed JSON diffs
β βββ diff_2025-01-15_12-34-56.json
βββ last_update.json # Tracking metadata
```
---
### Species of the Day
#### `utils/sotd_manager.py` - Automated Species Featuring
- **`SpeciesOfTheDayManager` class**: SOTD selection logic
##### Key Methods
**Species Selection**
- **`select_species_of_the_day_for_config`**: Per-config selection
- Priority families (Theraphosidae, Salticidae, Lycosidae)
- History tracking (30-180 days)
- Minimum photo count (1-10)
- Weighted random selection
**Configuration**
- **`get_configs_due_for_posting_async`**: Configs due at current hour
- **`_config_log_tag`**: Logging helper
##### Implementation Details
- **Family Filtering**: Wildcard ('*', 'all', empty) = all families
- **Recently Featured**: Excludes species posted within history_days
- **Photo Validation**: Checks iNaturalist for min_photos
- **Fallback**: If no priority family species, tries all families
---
### Sync Server
#### `core/sync.py` - Bidirectional Synchronization
- **Purpose**: Webhook integration with Moltly & web API
##### Inbound Sync (aiohttp server on port 8080)
- **POST `/sync/molt`**: Receive molt events from Moltly
- **POST `/sync/notes`**: Receive note sync events
- **POST `/sync/dm`**: Forward messages to bot admins
##### Outbound Notifications
**To Moltly**
- **`notify_moltly_of_molt`**: POST molt event
- **`notify_moltly_molt_update`**: Molt edit notification
- **`notify_moltly_molt_delete`**: Molt deletion
- **`notify_moltly_note_delete`**: Note deletion
**To Web API**
- **`notify_web_of_favorite_add`**: Sync favorites
- **`notify_web_of_favorite_remove`**: Remove favorites
- **`notify_web_of_history`**: Update search history
##### Security
- All endpoints require `WSCA_SYNC_SECRET` header
- `start_sync_server` function initializes aiohttp app
---
### Embed Creator
#### `embeds/creator.py` - Discord Embed Builders
- **`EmbedCreator` class**: Consistent embed styling
##### Base Embeds
- **`create_base_embed`**: Template for all embeds
- **`create_error_embed`**: Red embeds (errors)
- **`create_success_embed`**: Green embeds (success)
- **`create_processing_embed`**: Orange (loading)
##### Species-Specific
- **`create_species_embed`**: Full species card
- Taxonomy (genus, species, family, author)
- LSID and distribution
- iNaturalist photos & observation count
- Weather at species location
- Observation locations on map
- Similar species
##### Special Content
- **`create_cites_embed`**: CITES/Species+ status
- **`create_weather_embed`**: Weather details
- **`create_species_of_the_day_embed`**: SOTD card with facts
- **`create_image_embed`**: Image gallery
- **`create_similar_species_embed`**: Comparison embeds
- **`create_changelog_embed`**: CSV update summaries
##### Utilities
- **`_clean_reference_text`**: HTML β text formatting
- **`_paginate_fields`**: Splits long field lists
---
### Command Handlers
#### `commands/handlers.py` - Core Search Logic
- **`SearchHandler` class**: Species/genus/family queries
##### Methods
- **`search_species`**: Species lookup with fuzzy matching
- **`search_genus`**: Genus-level info
- **`search_family`**: Family-level stats
- **`search_updates`**: CSV updates search
##### Features
- Fuzzy matching (configurable threshold)
- API enrichment (WSC, iNat, GBIF, CITES)
- Image fetching & composition
- Weather data integration
- Similar species recommendations
- Observation mapping
##### Discord UI Views
- **`FavoriteButtonView`**: Toggle favorite button
- **`ImageView`**: Image gallery navigation
- **`BulkSpeciesView`**: Multi-species selection
- **`SimpleListPaginatorView`**: Generic list pagination
- **`ArticlesHandler`**: Article browsing
##### Helper Functions
- **`build_web_app_url`**: Creates shareable web link
- **`create_web_link_view`**: Wraps URL in button view
---
### Prefix Commands
#### `commands/bot_commands.py` - ! Commands
**Search Commands**
- `!search [species|genus|family|updates] <query>` - Search spider data
- `!fuzzy <query>` - Fuzzy search
- `!random [species|genus|family] [filter]` - Random spider
**Media Commands**
- `!image <Genus species>` - Fetch photos
- `!recent <Genus species>` - iNaturalist observations
- `!map <Genus species>` - Distribution map
**Data Commands**
- `!weather <location>` - Weather lookup
- `!c / !convert` - Temperature conversion
- `!m / !measure <values>` - Unit conversion
- `!stats [families|genera|user]` - Statistics
**User Collections**
- `!fav [add|list|remove]` - Favorites
- `!wishlist [add|list|remove]` - Wishlist
- `!molt [add|list|remove]` - Molt logging
- `!notes [list|view|share|alias|search]` - Personal notes
**Admin/Meta**
- `!cl [list|<number>]` - Changelog
- `!updatecsv` - Force CSV update (owner)
- `!syncspecies` - Sync to MongoDB (owner)
- `!article [category]` - Browse articles
---
### Slash Commands
#### `commands/slash_commands.py` - / Commands
**Mirrors of Prefix Commands**
- `/species <query>` - Species search
- `/genus <query>` - Genus search
- `/family <query>` - Family search
- `/image <species>` - Image lookup
- `/recent <species>` - Observations
- `/map <species>` - Distribution map
- `/weather <location>` - Weather
- `/updates <query>` - CSV updates
- `/random [type] [filter]` - Random spider
- `/stats [type]` - Statistics
- `/fav [action] [species]` - Favorites
- `/wishlist [action] [species]` - Wishlist
- `/molt [action] [...]` - Molt tracking
- `/notes [action] [...]` - Notes
- `/sotd [action]` - Species of the Day management
- `/settings feature_channel` - Guild settings
---
### Changelog Handler
#### `commands/changelog.py` - Changelog UI
- **`ChangelogHandler` class**: Changelog display
##### Methods
- **`list_changelogs`**: Paginated changelog list
- **`show_changelog`**: Specific changelog details
- **`show_changelog_range`**: Multi-changelog comparison
##### Features
- Pagination for large lists
- Statistics formatting (new species, synonyms)
- Markdown for Discord
---
## π Web Stack
[Spiders Archive](https://spiders.invert.info)
### Web Server
#### `web/server/src/server.ts` - Bun + Hono API
- **Framework**: Hono (TypeScript for Bun)
- **Database**: MongoDB
##### Key Endpoints
**Species Search**
- `POST /api/search` - Full-text species search
- Modes: species, genus, family, distribution, location, doi
- Fuzzy matching & alias resolution
- Paginated results
**Type Hints (Autocomplete)**
- `GET /api/hints?mode=<mode>&q=<query>`
- Modes: distribution, location, reference, doi
**User Data Sync**
- `GET /api/user/profile` - Discord OAuth profile
- `GET /api/user/favorites` - User favorites
- `POST /api/user/favorites` - Add favorite
- `DELETE /api/user/favorites/<lsid>` - Remove
- `GET /api/user/history` - Search history
- `POST /api/user/history` - Log search
**Species Details**
- `GET /api/species/<query>` - Full species data
- Enriched with WSC, iNat, GBIF, weather
- Taxonomy, images, observations, similar species
**Changelog**
- `GET /api/changelogs` - List changelogs
- `GET /api/changelogs/<id>` - Single changelog
- `GET /api/changelogs/compare` - Compare two
**Authentication**
- OAuth via Better-Auth with Discord provider
- JWT token in Authorization header
- Session management & refresh
---
### Web Client
#### `web/client/src/App.tsx` - React 19 SPA
- **Framework**: React 19 + Vite + TypeScript
- **Maps**: Leaflet.js
- **Auth**: Better-Auth (Discord OAuth)
##### Key Components
**Search Interface** (`App.tsx`)
- Search bar with autocomplete
- Advanced filters (genus, family, distribution, location)
- Paginated results or single detail view
- Species detail card:
- Taxonomy & LSID
- Distribution map (Leaflet)
- iNaturalist photos & observations
- Weather at observation locations
- Similar species
- External resource links
**User Features** (`UserCollectionsContext.tsx`)
- **Favorites**: Save species
- **Wishlist**: Species to find/keep
- **Molt History**: Log molts
- **Pin Items**: Quick access
- **Search History**: Past searches
**Pages**
- **Species Detail**: Full info with media
- **Changelog List** (`ChangelogListPage.tsx`): Browse updates
- **Single Changelog** (`ChangelogPage.tsx`): Detailed view
- **Changelog Compare** (`ChangelogComparePage.tsx`): Side-by-side
- **Stats Page** (`StatsPage.tsx`): User achievements
- **Quick Access Panel** (`QuickAccessPanel.tsx`): Pinned items
**UI Components**
- **UserMenu** (`UserMenu.tsx`): Profile dropdown
- **ThemeContext** (`ThemeContext.tsx`): Light/dark theme
- **EnrichedSpeciesCard** (`EnrichedSpeciesCard.tsx`): Enhanced species card
- **ErrorBoundary**: Error handling
**Authentication** (`AuthContext.tsx`)
- Discord OAuth integration
- Profile fetching
- Token refresh
##### Styling
- **App.css**: Comprehensive responsive design (93 KB)
- **variables.css**: CSS custom properties
- **index.css**: Base styles (4.9 KB)
- **StatsPage.css**: Stats styling (8.2 KB)
---
## π§ Development & Deployment
### Local Development
**Python Bot**
```bash
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python main.py
```
**Web Server**
```bash
cd web/server
bun install
bun run dev # Hot reload
```
**Web Client**
```bash
cd web/client
npm install
npm run dev # Vite dev server
```
### Docker Deployment
**Services** (`docker-compose.yml`)
1. **mongo** - MongoDB 8 with auth
2. **wsc-bot** - Python Discord bot
3. **wsc-api** - Bun/Hono web server
**Commands**
```bash
docker compose up -d # Start all
docker compose down # Stop
docker compose logs -f # Follow logs
make build # Rebuild
make restart # Restart bot
```
---
## π Security & Performance
### Rate Limiting
- **Command Queue** (`core/bot.py:39`): Per-guild queuing, 0.1s delays
- **API Caching**: 24h TTL, reduces external load
- **MongoDB Archive**: Permanent API call history
### Blacklisting
- User/guild blacklist in SQLite
- `check_blacklist()` decorator on all commands
- Owner can manage via admin commands
### Authentication
- **Discord OAuth**: Better-Auth integration
- **Session Tokens**: MongoDB storage
- **Webhook Security**: `WSCA_SYNC_SECRET` header validation
### Performance Optimizations
- **In-Memory CSV**: Fast lookups with indices
- **Rapidfuzz**: C-based fuzzy matching
- **MongoDB Indexing**: Unique on LSID, canonical, genus, family
- **Async Throughout**: aiohttp, motor, aiosqlite
- **Batch Operations**: 500-doc batches for MongoDB
---
## π Data Structures
### CSV Structure (`species.csv`)
```
LSID, Genus, Species, Subspecies, Family, Author, Year, Distribution, Status, Original_combination, Reference, Notes
```
### MongoDB Collections
**`api_cache`**: API response cache with soft TTL
**`api_calls`**: Permanent API call archive
**`species_index`**: Full species data from CSV
**`species_alias_index`**: User-defined aliases
**`user_favorites`**: User favorite species
**`user_wishlists`**: User wishlists
**`user_molt_logs`**: Molt records
**`user_notes`**: Personal notes
**`user_history`**: Search history
**`discord_users`**: OAuth user profiles
### SQLite Tables
**`favorites`**: User favorites
**`wishlists`**: User wishlists
**`molt_logs`**: Molt tracking
**`notes`**: Personal notes
**`sotd_configs`**: Species of the Day configs
**`blacklist`**: User/guild blacklist
**`species_observations`**: Observation counts
**`user_stats`**: User statistics
---
## π External APIs
### World Spider Catalog (WSC)
- **Endpoint**: `https://wsc.nmbe.ch/api/`
- **Auth**: WSC_API_KEY
- **Data**: Species taxonomy, references, LSID
### WSC Updates
- **Endpoint**: `https://wsc.nmbe.ch/api/updates`
- **Auth**: WSC_API_KEY
- **Data**: Retrieve LSIDs of new or changed taxa. If date parameter is not provided, the response will include LSIDs of new or changed taxa of last six months (default)
### iNaturalist
- **Endpoint**: `https://api.inaturalist.org/v1/`
- **Auth**: None (public API)
- **Data**: Photos, observations, taxon IDs
### GBIF
- **Endpoint**: `https://api.gbif.org/v1/`
- **Auth**: None
- **Data**: Species occurrences, distribution
### Species+ (CITES)
- **Endpoint**: `https://api.speciesplus.net/api/v1/`
- **Auth**: SPECIESPLUS_TOKEN
- **Data**: CITES appendix, legislation
### Open-Meteo
- **Endpoint**: `https://archive-api.open-meteo.com/v1/`
- **Auth**: None
- **Data**: Historical weather (30-year archive)
### API Ninjas
- **Endpoint**: `https://api.api-ninjas.com/v1/`
- **Auth**: API_NINJAS_KEY
- **Data**: Fun facts
---
## π Environment Variables
**Essential**
```env
DISCORD_TOKEN=<bot_token>
WSC_API_KEY=<api_key>
```
**Optional**
```env
MONGO_URL=mongodb://mongo:27017
MONGO_DB=wsc_bot
CACHE_TTL_HOURS=24
WEB_APP_BASE_URL=https://spiders.invert.info
WEB_API_URL=http://wsc-api:8000
SPECIESPLUS_TOKEN=<token>
API_NINJAS_KEY=<key>
MOLTLY_SYNC_URL=https://moltly.xyz/api/sync/wsca
MOLTLY_SYNC_SECRET=<secret>
WSCA_SYNC_SECRET=<secret>
WSCA_SYNC_PORT=8080
BOT_ADMIN_IDS=
LOG_LEVEL=INFO
```
---
## π§ͺ Testing & Quality
### Linting
```bash
# Python
ruff check .
ruff format .
# TypeScript (Server)
cd web/server && bun run lint
# TypeScript (Client)
cd web/client && npm run lint
```
### Type Checking
```bash
cd web/server && bun run check
```
---
## π Key Architectural Decisions
### Modular Design
- Separation of concerns: commands, APIs, data, embeds
- Clear responsibility boundaries
- Easy to test individual components
### Async-First
- All I/O operations non-blocking
- Hundreds of concurrent users supported
- Long-running tasks don't freeze bot
### Multi-Tier Caching
- **In-memory CSV**: Instant lookups
- **MongoDB soft TTL**: Persistent API cache
- **API call archive**: Debugging & analytics
### Distributed Sync
- Webhook-based Moltly integration
- Bidirectional molt/note updates
- User data synced between Discord & web
### Flexible Search
- CSV exact/fuzzy matching
- MongoDB full-text for web
- User-defined alias system
### Automated SOTD
- Per-guild configurations
- Smart selection (priority families, min photos)
- Educational content integration
---
## πΊοΈ Project File Organization
| Responsibility | Files |
|---|---|
| **Configuration** | `core/config.py`, `.env` |
| **Bot Lifecycle** | `main.py`, `core/bot.py` |
| **Commands** | `commands/bot_commands.py`, `commands/slash_commands.py`, `commands/handlers.py`, `commands/changelog.py` |
| **Data Layer** | `database/manager.py`, `utils/data.py`, `utils/csv_updater.py` |
| **Caching** | `utils/cache.py`, `utils/species_indexer.py` |
| **API Integration** | `api/manager.py` |
| **Display** | `embeds/creator.py` |
| **Automation** | `utils/sotd_manager.py`, `utils/facts_manager.py` |
| **Synchronization** | `core/sync.py` |
| **Web Frontend** | `web/client/src/App.tsx`, `web/client/src/*.tsx` |
| **Web Backend** | `web/server/src/server.ts` |
| **Scripts** | `scripts/*.py` |
---
## π Common Tasks
### Force CSV Update
```bash
# Via Discord (owner only)
!updatecsv
# Via script
python -m scripts.sync_species_index
```
### Sync Species Index
```bash
# Via Discord (owner only)
!syncspecies
# Via script
python -m scripts.sync_species_index
```
### Backup User Data
```bash
bash scripts/backup_bot_data.sh
```
### View Logs
```bash
# Docker
docker compose logs -f wsc-bot
# Local
tail -f bot_logs/bot.log
```
---
## π― Feature Highlights
### Species Search
- 50,000+ spider species from WSC
- Fuzzy matching with rapidfuzz
- Alias support for common names
- Rich embeds with images, weather, observations
### Species of the Day
- Automated daily posting
- Per-guild configuration
- Priority family filtering
- Educational content integration
### User Collections
- Favorites, wishlists, molt logs
- Synced between Discord & web
- Search history tracking
- Personal notes with sharing
### CSV Update System
- Daily automatic updates
- Detailed changelogs
- Diff generation
- MongoDB index sync
### Weather Integration
- 30-year historical archive
- Location-based weather
- Species habitat weather
- Observation weather snapshots
### CITES Status
- Species+ integration
- Appendix listings
- Legislation tracking
- Conservation status
---
*Last Updated: 2026-01-31*
*Documentation Version: 1.0*