Sentinel — Alerting and Monitoring Service
Status: Implemented — sentinel/ repo active with FastAPI server, MCP mount, and alerting domain objects (AlertEvent, Notification, Sensor, Subscriber) migrated from rest's userspace.
Package: sentinel (Python pkg server/)
Depends on: platform.axonis-core, platform.service-contract
Milestone: P2 (after axonis-core is published)
Repo-local spec. Cross-cutting service mechanics (auth,
/health+/service-info, Helm chart, CI/CD,uv/hatchlingpackaging,ruff) are not covered here — they follow platform.service-contract and its Cross-Cutting Requirements. This spec covers what makes the alerting service unique.
Who & why
Trigger: As an operator monitoring federated sites, I want a dedicated alerting service that ingests alert events from any source, evaluates thresholds, and routes notifications, so that the alert lifecycle (acknowledge → resolve → escalate) is managed in one place with domain-specific tools instead of generic CRUD.
Current pain. Alerting objects (AlertEvent, Notification, Sensor, Subscriber) lived in the alerts Elasticsearch index and were reached only through fedai-rest's generic userspace CRUD. There was no alert-specific workflow (no acknowledge/resolve/escalate), no threshold evaluation, and no uniform way to record where an alert came from — so cross-source alert handling and routing had to be reimplemented per caller.
Job to be done. Own the full alert lifecycle across heterogeneous sources (sensors, rules engines, ML models, humans, external systems) under one source-identity model, with first-class workflow tools and notification routing.
Out of scope. Raw data ingestion (Conduit, component.conduit.service); device writes back to sensors (Relay, platform.relay); and all cross-cutting service scaffolding (auth middleware, health/service-info contract, Helm, CI/CD, packaging) which platform.service-contract already mandates for every service.
Purpose
Sentinel is the alerting and monitoring microservice. It owns all alert lifecycle management: event ingestion, threshold evaluation, notification dispatch, sensor management, and subscriber routing.
Previously, alerting objects (AlertEvent, Notification, Sensor, Subscriber) were stored in the alerts Elasticsearch index and accessed via fedai-rest's generic userspace CRUD. Sentinel extracts these into a dedicated service with domain-specific tools for alert management workflows.
Scope: generic alerting, sensors as one source type
Sentinel is a generic alerting service. Alerts may originate from many kinds of sources, of which sensors are one common case. Other valid sources include:
- Rules engines (transaction-monitoring rule trips)
- ML models emitting an anomaly score
- Scheduled audit jobs finding violations
- Humans manually escalating events
- External systems posting alerts via the API
Every AlertEvent carries a source identity consisting of:
source_type— one ofsensor | rule | model | human | externalsource_uid— the UID of the specific origin within that type (e.g. the sensor's UID whensource_type=sensor)
Source-specific resources (e.g. alerts://sensors/{sensor_id}/history) filter alerts where source_type matches the resource scheme and source_uid matches the parameter. Other source types may add their own resources without changing the core alert model.
Sensor objects remain the registry of configured sensor sources (their thresholds, sites, types). Non-sensor sources are not registered in the Sensor index — they identify themselves only at alert-creation time via source_type + source_uid.
Domain Objects (from axonis-core userspace)
| Object | Schema constant | ES index | Purpose |
|---|---|---|---|
| AlertEvent | Schema.ALERT_EVENT |
alerts | Triggered alert with severity, site, status; identifies its origin via source_type + source_uid |
| Notification | Schema.NOTIFICATION |
alerts | Dispatched notification record |
| Sensor | Schema.SENSOR |
alerts | One kind of alert source — registers a configured sensor (threshold, type, site). Non-sensor sources self-identify at alert-create time and are not registered here. |
| Subscriber | Schema.SUBSCRIBER |
alerts | Alert routing target (user, channel, webhook) |
| AlertSubscriber | Schema.ALERT_SUBSCRIBER |
alerts | Subscription binding (source → subscriber) |
| AlertThreshold | Schema.THRESHOLD |
alerts | Configurable threshold definition |
Code Structure
server/
__init__.py
__main__.py # Starlette: /agentspace, /api/v1, /health, /service-info
api/
__init__.py
routes.py # FastAPI REST endpoints
schema/
alerting_objects.yml # OpenAPI component schemas
mcp/
__init__.py
server.py # FastMCP tools + resources
commands.py # Command layer (shared by REST + MCP)
Port and API Version
Port: 8005 API version: /api/v1
MCP Tools (18)
CRUD (12)
| Tool | Purpose |
|---|---|
alert_list(summary, status, severity, site_id) |
List alerts with optional filters |
alert_get(uid) |
Get alert by UID |
alert_create(body) |
Create alert event |
alert_update(uid, body) |
Update alert (e.g., change status) |
sensor_list(summary, sensor_type) |
List sensors with optional type filter |
sensor_get(uid) |
Get sensor by UID |
sensor_create(body) |
Create sensor definition |
sensor_update(uid, body) |
Update sensor |
subscriber_list(summary) |
List subscribers |
subscriber_get(uid) |
Get subscriber by UID |
subscriber_create(body) |
Create subscriber |
notification_list(summary, alert_id) |
List notifications, optionally by alert |
Workflow (4)
| Tool | Purpose |
|---|---|
alert_acknowledge(uid, acknowledged_by, note) |
Acknowledge an alert — sets status to ACKNOWLEDGED |
alert_resolve(uid, resolved_by, resolution) |
Resolve an alert — sets status to RESOLVED |
alert_escalate(uid, escalate_to, reason) |
Escalate an alert — creates notification to escalation target |
alert_evaluate(sensor_id, value) |
Evaluate a value against a sensor's threshold — returns whether alert should trigger |
Introspection (2)
| Tool | Purpose |
|---|---|
alert_summary(time_range, group_by) |
Summary of alerts by severity/site/status over a time range |
sensor_status(sensor_id) |
Current state of a sensor: last triggered, alert count, health |
MCP Resources (2)
| URI | Purpose |
|---|---|
alerts://active |
List of currently active (unresolved) alerts |
alerts://sensors/{sensor_id}/history |
Recent alert history for a sensor |
REST Endpoints
CRUD
| Method | Path | Maps to MCP tool |
|---|---|---|
| GET | /api/v1/alert |
alert_list |
| GET | /api/v1/alert/{uid} |
alert_get |
| POST | /api/v1/alert |
alert_create |
| POST | /api/v1/alert/{uid} |
alert_update |
| GET | /api/v1/sensor |
sensor_list |
| GET | /api/v1/sensor/{uid} |
sensor_get |
| POST | /api/v1/sensor |
sensor_create |
| POST | /api/v1/sensor/{uid} |
sensor_update |
| GET | /api/v1/subscriber |
subscriber_list |
| GET | /api/v1/subscriber/{uid} |
subscriber_get |
| POST | /api/v1/subscriber |
subscriber_create |
| GET | /api/v1/notification |
notification_list |
Workflow
| Method | Path | Maps to MCP tool |
|---|---|---|
| POST | /api/v1/alert/{uid}/acknowledge |
alert_acknowledge |
| POST | /api/v1/alert/{uid}/resolve |
alert_resolve |
| POST | /api/v1/alert/{uid}/escalate |
alert_escalate |
| POST | /api/v1/sensor/{uid}/evaluate |
alert_evaluate |
Introspection
| Method | Path | Maps to MCP tool |
|---|---|---|
| GET | /api/v1/alert/summary |
alert_summary |
| GET | /api/v1/sensor/{uid}/status |
sensor_status |
Service Info
{
"name": "sentinel",
"version": "1.0.0",
"description": "Alerting and monitoring — event lifecycle, sensors, notifications",
"mcp_path": "/agentspace",
"health_path": "/health",
"api_path": "/api/v1",
"tools_count": 18,
"resources_count": 2,
"capabilities": ["alert", "sensor", "subscriber", "notification", "threshold"]
}
Command Layer
# server/mcp/commands.py
from axonis.userspace.alerting import AlertEvent, AlertThreshold, Subscriber, Notification
from axonis.userspace.intelligence import Memory
memory = Memory()
def acknowledge_alert(uid, acknowledged_by, note=""):
alert = AlertEvent().read(uid=uid)
update = {"status": "ACKNOWLEDGED", "acknowledged_by": acknowledged_by, "note": note}
AlertEvent().update(update, uid)
memory.create({
"content": f"Acknowledged by {acknowledged_by}: {note}",
"memory_type": "alert_ack",
"source_conversation_id": uid,
})
return {**alert, **update}
def evaluate_threshold(threshold_id, value):
threshold = AlertThreshold().read(uid=threshold_id)
operator = threshold.get("operator", "gt")
limit = threshold.get("value", 0)
triggered = (
(operator == "gt" and value > limit) or
(operator == "lt" and value < limit) or
(operator == "eq" and value == limit) or
(operator == "gte" and value >= limit) or
(operator == "lte" and value <= limit)
)
return {"threshold_id": threshold_id, "value": value, "threshold": limit,
"operator": operator, "triggered": triggered}
Alert Filters
The alert_list tool supports the following filters:
sensor_type— filter by sensor type (only meaningful whensource_type=sensor)severity— filter by severity levelsite_id— filter by site/locationstatus— filter by alert status (ACTIVE, ACKNOWLEDGED, RESOLVED, ESCALATED)source_type— filter by origin kind (sensor | rule | model | human | external)source_uid— filter by specific origin UID
Filters are passed as query parameters in REST and as tool arguments in MCP. The first four are pushed down to ES via Schema.ALERT_FILTERS; source_type and source_uid are applied in Python until axonis-core's ALERT_FILTERS set is extended (tracked separately).
Alert Lifecycle Pipeline
Beyond the request-scoped CRUD/workflow tools above, Sentinel owns an end-to-end evaluation pipeline that turns raw source readings into notifications and signals. The stages are source-agnostic; sensors are the canonical case but the same flow applies to any source_type.
Trigger flow
Source reading (sensor poll, rule trip, model score, external POST)
|
| 1. Normalize to standard reading schema, persist
v
Threshold evaluation
| 1. Resolve effective threshold (per-source override, else default)
| 2. Evaluate reading vs threshold (alert_evaluate semantics)
| 3. Check cooldown (see #cooldown) — skip if within min report interval
| 4. If exceeded AND not in cooldown:
| a. Query matching subscribers
| b. Filter by min_severity and quiet hours
| c. Dispatch notifications (per channel)
| d. Write AlertEvent (status=triggered) — also acts as cooldown marker
| e. Write Notification record(s)
| f. Emit Signal to cortex via signal_create (see #signal-integration)
v
Subscriber notified + ADI shows Signal in Monitor
Clear flow
When readings return below threshold, the evaluator transitions the matching AlertEvent (sets cleared_at and status) and emits a resolved Signal to cortex (signal_create). Per #invariants, the AlertEvent is status-transitioned, never deleted.
Threshold Configuration
Thresholds resolve from two layers, override taking precedence:
- Defaults — base threshold definitions per source/sensor type (platform-level configuration). New source types are onboarded by adding a section here; no code change required.
- Per-source overrides — stored as
AlertThresholdrecords (Schema.THRESHOLD,alertsindex) keyed by source/site; take precedence over defaults.
Resolution order: per-source override checked first, then the type default.
Evaluation modes
A threshold evaluates in one of two modes:
- Fixed value — the threshold's
valuefield holds the comparison number (used byevaluate_threshold/alert_evaluate). - Compare field —
compare_fieldreferences another field in the source reading; the reading's value is compared against that field rather than a constant.
Severity levels
| Level | Priority | Description |
|---|---|---|
warning |
Low | Initial threshold breach |
high |
Medium | Elevated concern, rapid changes |
critical |
High | Severe threshold breach |
extreme |
Highest | Emergency condition |
Cooldown Logic
A minimum report interval prevents notification storms when a reading oscillates near a threshold boundary. Cooldown is state-derived from the alerts index (no external store): before dispatching, query for the most recent AlertEvent matching (source_type, source_uid, threshold_name) with triggered_at >= now - interval (size=1, sorted triggered_at desc).
- Found → within cooldown → skip (no notification, no new AlertEvent).
- Not found → dispatch; the written AlertEvent becomes the cooldown marker for the next evaluation.
The interval is sourced from the effective threshold's min_report_interval_sec (per-source override first, then default — same resolution order as #threshold-config).
Subscriber Routing
Subscribers carry one or more subscriptions and delivery preferences that the pipeline filters against at dispatch time:
subscriptions[]— each binds asource_type/sensor_type, a list ofsite_ids("*"= all), amin_severityfloor, and anactiveflag. A subscriber is matched when an active subscription's type and site match the alert and the alert severity ≥min_severity.notification_channels— ordered channels (e.g.sms); each produces a Notification record.quiet_hours— optional{enabled, start, end}window in the subscriber'stimezone; alerts falling inside the window are suppressed for that subscriber.
Each dispatched channel produces a Notification record (Schema.NOTIFICATION, immutable per #invariants) capturing channel, destination, provider, provider message id, status, and timestamps.
Signal Integration (ADI)
Alerting and the ADI signal surface serve different audiences and are intentionally distinct:
| Concern | Alerting | Signal (ADI) |
|---|---|---|
| Audience | Subscribers (phone/channel targets) | People with roles (accountability packs) |
| Action | Notify immediately | Investigate, decide, attest over hours/days |
| Output | Notification delivered | Auditable decision record with evidence |
| Storage | alerts index |
intelligence index |
| Question | "Did we notify?" | "Did we respond properly?" |
Sentinel manages the notification infrastructure; the signal it emits feeds the accountability record consumed by the ADI investigation workflow (Cortex/Beacon).
Dual-path signal ingestion
Signals reach the intelligence index by two complementary, first-class paths producing identical Signal v2 documents:
- Path 1 — pipeline-emitted. As a side effect of threshold evaluation + notification, Sentinel maps the AlertEvent to a Signal v2 document and emits it to cortex (the owning service) via cortex's
signal_createsurface — cortex validates it against the signal governance rules (severity/dedup/status) and persists it to theintelligenceindex. Sentinel never writes the intelligence index directly; routing through cortex ensures the governance ceremony is applied rather than bypassed by a raw index write. The AlertEvent cross-references thesignal_idcortex returns. Source: platform-evaluated thresholds. - Path 2 — direct push. External systems POST a Signal directly (e.g.
PUT /userspace/signal/{signal_id}) without going through the alerting pipeline; they own their threshold/state logic. Source: source-evaluated conditions (webhooks, polling jobs, correlation engines).
Both feed the same ADI accountability flow: Cortex loads the user's accountability pack, filters by signal_type and severity, surfaces the signal in Beacon's role-filtered Signal Queue, where a user opens an Investigation, pins evidence, selects a decision template, and a reviewer attests (separation of duties) before the edition is frozen and tasks dispatched.
Alert → Signal mapping
When the pipeline (Path 1) converts an AlertEvent to a Signal:
Severity:
| Alert severity | Signal severity |
|---|---|
warning |
medium |
high |
high |
critical |
critical |
extreme |
critical |
Status:
| Alert status | Signal status |
|---|---|
triggered |
new |
cleared |
resolved |
acknowledged |
acknowledged |
Fields:
| Alert field | Signal field |
|---|---|
source_type/sensor_type (e.g. water_level) |
signal_type (e.g. sensor_water_level) |
source_uid / site_id |
subject.id |
"sensor" (literal, when source is a sensor) |
subject.type |
location |
subject.name |
message |
description |
triggered_at |
detected_at |
The mapped Signal is submitted to cortex's signal_create, which validates and persists it to the intelligence index with subtype: signal; Sentinel records the returned signal_id on the AlertEvent.
Object Schemas
Reference field shapes for the alerting objects (the alerts index, see #domain-objects). Cross-cutting envelope fields (uid, timestamps, visibility) follow platform.axonis-core.
AlertEvent
{
"alert_id": "alert_xyz789",
"source_type": "sensor",
"source_uid": "hcfcd_001",
"sensor_type": "water_level",
"site_id": "hcfcd_001",
"threshold_name": "minor_flood",
"severity": "high",
"status": "triggered",
"current_value": 25.3,
"threshold_value": 24.0,
"field": "stream_level_current_ft",
"message": "Minor flooding at hcfcd_001: 25.3 ft (threshold: 24.0 ft)",
"signal_id": "abc123-def456",
"triggered_at": "2026-03-01T14:30:00Z",
"cleared_at": null,
"notifications_sent": 3,
"notifications_failed": 0
}
signal_id cross-references the emitted Signal in the intelligence index (see #signal-integration).
Subscriber
{
"subscriber_id": "sub_abc123",
"name": "Jane Doe",
"email": "jane@example.com",
"phone": "+15551234567",
"notification_channels": ["sms"],
"subscriptions": [
{"sensor_type": "water_level", "site_ids": ["hcfcd_001", "*"], "min_severity": "warning", "active": true}
],
"timezone": "America/Chicago",
"quiet_hours": {"enabled": false, "start": "22:00", "end": "06:00"},
"active": true
}
AlertThreshold
{
"site_id": "hcfcd_001",
"sensor_type": "water_level",
"threshold_name": "minor_flood",
"override_value": 26.0,
"min_report_interval_sec": 900,
"enabled": true,
"created_by": "admin"
}
min_report_interval_sec drives the cooldown window (see #cooldown).
Notification
{
"notification_id": "notif_def456",
"alert_id": "alert_xyz789",
"subscriber_id": "sub_abc123",
"channel": "sms",
"destination": "+15551234567",
"status": "sent",
"provider": "twilio",
"provider_message_id": "SM1234567890",
"message_body": "Minor flooding at hcfcd_001...",
"sent_at": "2026-03-01T14:30:05Z",
"delivered_at": "2026-03-01T14:30:08Z",
"retry_count": 0,
"error": null
}
Memory Namespace
sentinel — stores alert acknowledgments, resolutions, escalation history.
Migration from fedai-rest
- Alert objects currently live in fedai-rest's generic userspace CRUD
- Sentinel takes ownership of the
alertsES index - fedai-rest removes alerting targets from its USERSPACE dict
- Oracle gateway routes alert tools to sentinel instead of fedai-rest
- axonis-client routing maps
alertsindex → sentinel service URL
Invariants
- Alerts are append-only. AlertEvent records are never deleted, only status-transitioned.
- Acknowledge/resolve require a user identity. The
acknowledged_by/resolved_byfields are mandatory and come from the auth token. - Threshold evaluation is pure.
alert_evaluatecomputes whether a threshold is exceeded but does NOT create an alert. The caller decides whether to act on the result. - Notifications are immutable. Once created, notification records cannot be modified.
- Cooldown is state-derived. The cooldown decision is computed from the
alertsindex alone (most recent matching AlertEvent within the interval); no separate cooldown store exists.
Test Expectations
- CRUD roundtrip tests for all 6 object types
- Acknowledge/resolve workflow tests
- Threshold evaluation tests (all operators)
- Alert filtering tests (by severity, status, site, sensor_type)
- Summary aggregation tests
- Auth tests (token required for all endpoints)
Depends on: platform.axonis-core, platform.service-contract