Skip to content

Architecture

STACKIT Observability is a fully managed observability platform within the STACKIT Cloud. It provides a complete, pre-configured toolchain for collecting, storing and visualizing telemetry data such as metrics, logs and traces. The service enables you to connect your systems through open standards, analyze performance in real time and react quickly to incidents all while STACKIT operates and maintains the underlying infrastructure.

STACKIT Observability builds on well-known open-source components that form the backbone of modern monitoring stacks:

  • Prometheus for collecting metrics
  • Thanos for long-term storage
  • Grafana Loki for logs
  • Grafana Tempo for distributed traces
  • Grafana itself for dashboards and visualization.

Together with Prometheus Alertmanager for alerting, they form an integrated and highly available observability architecture managed entirely by STACKIT.

Diagram

To monitor your systems, you define Jobs that determine how telemetry data is gathered. A job regularly scrapes one or several targets, which are HTTP endpoints exposing metrics in the OpenMetrics or Prometheus format. These targets represent the systems or applications you want to monitor. The scraping interval is configurable and typically runs every few minutes, ensuring up-to-date insights into your infrastructure.

Alternatively, telemetry can be pushed directly to the service using OpenTelemetry standards. This makes it easy to integrate a wide variety of environments — from containerized applications to traditional servers — without complex setup.

Once telemetry data is ingested, it is processed and stored by the Observability backend. Prometheus handles short-term metric storage, while Thanos extends it with long-term retention for up to 26 months. Logs are collected and indexed by Grafana Loki and traces are stored using Grafana Tempo. This architecture allows you to correlate metrics, logs and traces for deep analysis and troubleshooting.

The system automatically manages scaling and retention. Logs and traces are retained for up to 30 days, enabling detailed short-term debugging, while metrics remain available for long-term trend analysis and capacity planning.

At the visualization layer, Grafana provides a powerful and intuitive dashboard interface. Here you can build interactive dashboards using multiple chart types, apply filters and correlate data across services. Dashboards can be organized in folders, allowing teams to separate views by environment, service or department.

All visualizations are accessible through the STACKIT Cloud Portal and can be customized according to your needs. The integration with STACKIT’s identity and access management ensures that only authorized users can view or edit specific dashboards.

The alerting system is based on Prometheus Alertmanager. It continuously evaluates your defined alert rules and notifies you whenever thresholds are exceeded. Alerts can be sent through multiple communication channels, such as email or webhooks, allowing teams to react quickly to critical events. Configuration and management of alert rules and receivers are available through both the portal interface and the Observability API.

STACKIT Observability is designed for reliability and minimal operational overhead. All components are deployed in a high-availability configuration and are continuously monitored by STACKIT. The service automatically handles updates, scaling and maintenance, ensuring that your observability environment remains secure, up to date and performant without manual intervention.

By using a managed service model, customers benefit from consistent availability and predictable performance without the effort of operating and maintaining multiple observability tools themselves.

Communication between your systems and the Observability service is secured through encrypted connections (TLS). Access to dashboards, metrics and configurations follows STACKIT’s integrated permission and role model. This ensures that monitoring data remains accessible only to authorized users or groups.

You can manage access through the STACKIT Cloud Portal, defining which users or teams can create dashboards, manage alerts or modify service configurations. The service dashboard also provides an overview of your observability instances, configurations and connection details.