Monitoring dashboards

Minimum viable dashboard

Panel	Query / source
`heapUsed` over time	`nodejs_heap_size_used_bytes`
`rss` over time	`process_resident_memory_bytes`
Request rate	`http_requests_total`
Deploy markers	annotations

Overlay RPS with memory — distinguish leak from traffic growth.

Alert rules (examples)

Leak suspicion:

rate(nodejs_heap_size_used_bytes[1h]) > 5MB per hour
AND
rate(http_requests_total[1h]) < 10% change

Imminent OOM:

nodejs_heap_size_used_bytes / nodejs_heap_size_total_bytes > 0.85 for 15m

Grafana-style mental model

xychart-beta
  title "Production heap with deploy marker"
  x-axis [00h, 06h, 12h, 18h, 24h]
  y-axis "RSS GB" 0 --> 4
  line [1.2, 1.3, 1.8, 2.4, 3.1]

Step at 12h without traffic change → investigate deploy diff.

Tools

Tool	Role
Prometheus + Grafana	OSS metrics
Datadog / New Relic	APM + process metrics
clinic.js	Deep Node profiling (staging)

See Tool comparison and the dedicated Grafana & Kubernetes lesson for cAdvisor metrics, OOMKill loops, and multi-pod debugging.