Prometheus

Published: 9 Feb. 2025 Last updated: 11 Dec. 2025

Summary

Prometheus is is an open-source monitoring and alerting system designed for cloud-native environments. Originally developed by SoundCloud and now part of the Cloud Native Computing Foundation (CNCF), Prometheus is widely used for collecting and analyzing time-series data from applications and infrastructure.

Prometheus works by periodically scraping metrics from configured targets, storing the results in its time-series database, and allowing querying through the PromQL language. It supports flexible alerting via the Alertmanager component and integrates with tools like Grafana for visualization. Known for its simplicity and deep Kubernetes integration, Prometheus has become a cornerstone of observability stacks used in microservices ecosystems.

This Application Note discusses how to use CloudCasa to properly protect and restore Prometheus running in containers on Kubernetes.

CloudCasa has been tested for this application note with Prometheus 3.1.0 created using the Prometheus Operator. The information herein is expected to apply to more recent versions as well.

Backup

When backing up Prometheus, defining a full cluster backup is recommended since there can be multiple cluster-scoped resources that may be required during the restore.

Prometheus always maintains a write-ahead log (WAL) that is replayed on restart, so no CloudCasa application hooks need to be configured for the backup job to provide application consistency. However, snapshots should be requested when defining backups using the options “Protect PVs using: Snapshot only” or “Protect PVs using: Copy to backup storage” plus “Snapshot PVs before backup where possible”. This also implies that the PVs containing Prometheus data should use storage classes that rely on provisioners capable of supporting CSI snapshots.

Restore

When creating the restore definition, you may select both the namespace of the Prometheus Operator (if used) and the namespace of the Prometheus cluster.

Make sure that you have enabled the “Include all cluster-scoped resources” switch when creating the restore definition. This will ensure that all the CRDs for Prometheus are restored.