Monitoring & Observability
Prometheus
Section titled “Prometheus”Cosmictron exposes a Prometheus scrape endpoint at :9090/metrics (configurable).
Key metrics
Section titled “Key metrics”| Metric | Type | Description |
|---|---|---|
cosmictron_reducer_calls_total | Counter | Total reducer invocations, labeled by module, reducer, status |
cosmictron_reducer_duration_seconds | Histogram | Reducer execution latency |
cosmictron_reducer_fuel_used | Histogram | Fuel consumed per reducer call |
cosmictron_wal_writes_total | Counter | WAL write operations |
cosmictron_wal_bytes_written_total | Counter | Bytes written to WAL |
cosmictron_wal_fsync_duration_seconds | Histogram | fsync latency |
cosmictron_subscription_count | Gauge | Active WebSocket subscriptions |
cosmictron_subscription_delta_rate | Gauge | Rows/s delivered to subscribers |
cosmictron_storage_pages_total | Gauge | Total pages in table store |
cosmictron_storage_bytes_used | Gauge | Bytes used in data directory |
cosmictron_auth_requests_total | Counter | Auth requests by type and status |
cosmictron_connections_active | Gauge | Active WebSocket connections |
cosmictron_compliance_signatures_total | Counter | Events signed (if signing enabled) |
cosmictron_compliance_tsa_requests_total | Counter | TSA timestamp requests (if enabled) |
Prometheus scrape config
Section titled “Prometheus scrape config”scrape_configs: - job_name: cosmictron static_configs: - targets: ['cosmictron-host:9090'] scrape_interval: 15sOpenTelemetry
Section titled “OpenTelemetry”Cosmictron emits traces for all reducer calls, queries, and subscription events via OTLP.
[telemetry]otlp_endpoint = "http://otel-collector:4317"Trace attributes:
cosmictron.reducer— reducer namecosmictron.module— module namecosmictron.identity— caller identity hashcosmictron.tx_id— transaction IDdb.system=cosmictron
Logging
Section titled “Logging”Log format: JSON (production) or pretty (development).
[telemetry]log_level = "info" # trace | debug | info | warn | errorlog_format = "json" # "json" | "pretty"Environment override:
COSMICTRON_LOG=debug,cosmictron_wal=traceStandard Rust env-filter format is supported for fine-grained control per module.
Health endpoints
Section titled “Health endpoints”| Endpoint | Description |
|---|---|
GET /v1/health | Basic liveness — returns 200 OK if the process is running |
GET /v1/health/ready | Readiness — returns 200 OK if WAL is initialized and storage is healthy |
GET /v1/health/detailed | Full diagnostics — WAL status, module health, subscription count |
Example response from /v1/health/detailed:
{ "status": "healthy", "wal": { "status": "ok", "pending_segments": 0 }, "storage": { "status": "ok", "pages": 120450, "bytes_used": 985040896 }, "modules": [ { "name": "my-agent", "status": "active", "reducers": 5 } ], "subscriptions": { "active": 47 }, "uptime_secs": 86400}Alerting recommendations
Section titled “Alerting recommendations”| Alert | Condition | Severity |
|---|---|---|
| High reducer error rate | rate(cosmictron_reducer_calls_total{status="error"}[5m]) > 0.05 | Warning |
| WAL fsync latency | cosmictron_wal_fsync_duration_seconds{p99} > 0.5 | Warning |
| Storage near full | cosmictron_storage_bytes_used / disk_total > 0.85 | Critical |
| No liveness | up{job="cosmictron"} == 0 | Critical |
| TSA failures | rate(cosmictron_compliance_tsa_requests_total{status="error"}[15m]) > 0 | Warning |