Cluster Health Monitoring Dashboard
Select a cluster and time period to see: metrics collected, counts at each stage and per agent (in/out), what is persisted in DB, and OK/fail if the pipeline is broken. Errors from agents in the period are listed below.
Live health from cluster; last error and activity from agent heartbeats (optional cluster filter below).
Per-cluster service counts (metrics_aggregated_ts vs metrics_raw) and, for a selected cluster, service count by namespace. Use this when not all services appear in Metrics Explorer.
metrics.k8s.io/v1beta1). Run kubectl get pods -A from the collector’s ServiceAccount; if it fails or is limited, fix Role/ClusterRole.app or app.kubernetes.io/name or pod name. Missing labels can change how many “services” you see.docs/metrics/DEBUG_MISSING_SERVICES_METRICS_EXPLORER.md (sections 0–2 and 5).
Walkthrough matches scripts/debug-incident-creation.py: publish a Voyager-style message on NATS
(astrolabe.events.<event_type>) so Astra sees the same path as HTTPS collectors.
Then verify agent_incidents and synced incidents.
clusters / collector config).astrolabe.events.<type> (not bare anomaly.detected), matching Voyager.astrolabe.events.*; processes payload (confidence, impact, dedup).agent_incidents; REST sync → incidents for triage UI.After publish: run pipeline check on Agent Status & Data Flow for the same cluster, or check Astra logs.