FAQ
-
General
What components are included in Observability?
- Grafana for visualisation
- Prometheus for collecting metrics data
- Thanos for longterm storage of Metrics
- Grafana Loki for logging data
- Grafana Tempo for tracing data
- Prometheus Alertmanager for threshold based alerting
- Observability API and Service Dashboard for service configuration
What is Observability?
Observability is a managed service that provides a powerful observability-toolset. It can observe targets which are serving data in OpenTelemetry format. Telemetry data (metrics, logs and traces) can be pushed to Observability. It is also possible to scrape metrics.
All telemetry data can be visualized with different chart types in a customizable dashboard. Every dashboard and diagram widget is highly customizable. Further, the service provides an integrated alerting system that can notify groups of users on various communication channels when a certain threshold is reached.How can I order Observability?
You need a STACKIT-account to take orders. Create a project or use an existing project to order a new Observability. Please follow our documentation for more details: Creating a service Observability.
How can I configure Observability?
After you’ve ordered Observability you can configure it via Observability Service Dashboard and Observability API. Please follow our documentation for more details: Configuring a service Observability
Do you offer a managed Grafana Service?
With Observability you get a managed Service which includes various of components. In our smallest serviceplan Frontend, you only get Grafana, as a managed service without further components.
Do you offer a managed Prometheus Service?
With Observability you get a managed Service which includes various of components. In our serviceplans monitoring and observability, a managed prometheus is included, besides further managed components.
For what topics am I responsible as a customer?
Our Observability Service cares about Installation, Updates, Upgrades, Stability, Availability of your Service-Components (like Grafana, Prometheus and Thanos).
You are responsible for everything that happens after you log in to the tools.
For example: Creating and handling dashboards, connecting your targets via Jobs, setting up Alerts, maintaining of retention time of your Metrics and Grafana User-Management.Where is my data stored and processed?
Currently, we offer a single region, EU01. All data processing and storage occur within this region. EU01 is located in Germany.
-
Integration
Do you support any other format then the Prometheus exposition format / OpenTelemetry?
No. The OpenTelemetry format is de facto standard in Monitoring since a couple of years, so we decided to use this standard to reach as many targets and Customers as possible.
Do you allow to use the RemoteWrite function of Prometheus/Thanos?
We recommend to use the regular pull-based approach. If this is not possible, you can make use of RemoteWrite.
Is it possible to get access to the Server?
As we provide a managed Service, there is no need and no possibility for you to login to the server. You can configure your Service via Observability API or STACKIT Portal.
I have more than 1 Observability Instance, what is the best way to scrape the same Metrics?
If you like to have the same Metrics in multiple Observability Instances, you have to scrape the same target in every Observability instance you need this data.
Multi-Scrape allows you to maintain different retention times, grant access for different people at each Grafana, etc.Is it possible to add my JSON-datasource?
Please use some converter (like Promtail or Telegraf) to convert your data into OpenTelemetry format, so you can scrape it with Observability.
Which official IP address ranges use the Observability Cluster?
If you want to limit access to your systems, you can use the following IP address ranges:
Cluster IP stackit1 45.135.246.168/32 stackit2 45.135.247.188/32 stackit3 45.135.244.2/32 stackit4 45.135.246.86/32 stackit5 45.135.244.47/32 stackit6 193.148.162.252/32 stackit7 193.148.174.113/32 stackit8 193.148.174.129/32 stackit9 45.129.41.59/32 stackit10 45.135.245.89/32 stackit11 192.214.176.140/32 stackit12 192.214.174.229/32 -
Metrics
Is it possible to collect Metrics more often then every minute?
Metrics which are collected every Minute are already called “high resolution Metrics”. Less than 1 minute is technical possible but rises the performance dramatically. Due to this, we have decided to allow 1 minute as highest resolution of Metrics.
Is it possible to insert historial data?
As we use a timeseries database, it is not possible to insert historical data. You can only add data from current values at this moment of time. Prometheus/Thanos have backiflling on the future roadmap.
What is PromQL?
PromQL is the Prometheus Query Language. If you create dashboards and panels in Grafana you need some knowledge about PromQL to select your Metrics. Check out the official Querying basics or this PromQL Cheat Sheet.
I have Metrics for different countries, is it possible to display all of them in one dashboard?
Yes. It is only important that your Metrics contain information about your country. For example, your Metric provides information in a “country” tag. You can use the ad hoc filter function at your Grafana Dashboard or you create individual panels per metric and country.
Do you support multi language for dashboards?
We use Grafana as software for visualization of your Metrics. Grafana currently doesn’t support multi language yet.
It is possible to copy existing dashboards and translate titles and metric-specific text.
We recommend you to maintain every text in your company language or in English language if you work with different countries. -
Grafana
Is it possible to create additional Grafana Accounts?
Yes. We provide admin privileges to your Grafana, so you can create additional users up to your service plan limit.
I have accidentally deleted a folder/dashboard, how can I restore it?
Grafana has no recycle bin. Luckily Observability has automated hourly backups, so you can restore the latest Grafana backup. Keep in mind, you restore Grafana like it was at this time, so everything which someone created (like folders, dashboards, users) since then, will be gone if you restore.
Interact with the Observability API. Get the latest timestamp of your grafana backup via /v1/projects/[projectId]/instances/[instanceId]/backups, it will be something like 01-09-2022T13:00:31. With this value you can call /v1/projects/[projectId]/instances/[instanceId]/backup-restores/[backupDate] to restore your grafana configuration.I have accidentally deleted/forgotten my Grafana Admin User, how can I restore it?
Observability has automated hourly backups, so you can restore the latest Grafana backup. Keep in mind, you restore Grafana like it was at this time, so everything which someone created (like folders, dashboards, users) since then, will be gone if you restore.
Interact with the Observability API. Get the latest timestamp of your grafana backup via /v1/projects/[projectId]/instances/[instanceId]/backups, it will be something like 01-09-2022T13:00:31. With this value you can call /v1/projects/[projectId]/instances/[instanceId]/backup-restores/[backupDate] to restore your grafana configuration. -
Configuration
Can I delete a specific time period or all my telemetry data?
Unfortunately this is not possible. You can only set your retention time to 1 day and wait 1 day. So all of your data which is older than 1 day will be deleted.
Can I edit Prometheus or Grafana configuration files?
You have no need to struggle with YAML-files, just use STACKIT Portal and Observability API to configure Prometheus and Grafana.
Is is possible to take backups of my configurations and dashboards?
There is an hourly scheduled backup of your configuration. You can restore backups of configurations/dashboards via Observability API.
Can I install Grafana plugins?
Observability is a managed Service, so unfortunately you have no access to the Server and can’t manually install things like Plugins.
-
Resources
Can I check my service plan utilization?
You can check the limits of your current service plan at the Observability Service Dashboard. Beside that, we provide an consumption overview Dashboard at your Grafana.
Can I upgrade or downgrade my plan?
Yes, an upgrade is possible at any time. A downgrade to a smaller plan is also possible if your configuration and data match the small plan. If it does not fit, the downgrade is canceled and there is no data loss.
Can I temporary disable Observability for maintenance of my own systems, so I will not get alerted during this time window?
There is an hourly scheduled backup of your configuration via Observability API. Delete your alerting configuration using Observability API. After your maintenance is finished, restore your alerting configuration from specific backup.
-
Known Issues
I always got "no data" at my Grafana Dashboard, is my connection broken?
You also get “no data” if there were no Metrics to collect. You can configure to display 0 value (or any other text) at Panel-Configuration.
Why do I get couple of same alerts all the time?
In most cases the alerting group interval is too small.
Group interval means: How long to wait before sending a notification about new alerts that are added to a group of alerts for which an initial notification has already been sent. (Usually ~5m or more.)Why is my alert triggered after 2-8 minutes?
Please have a look at the for-attribute in your Alerting configuration. Alerts are considered firing once they have been returned “for this long”.
Alerts which have not yet fired for long enough are considered pending. That means, e.g. “for 5 minutes” will result in having the expression active for 5 minutes, then the alert will be fired.Why do I have time-gaps in the graph of my metric?
Some providers or configurations of your on-premise Systems have prevention Filters or Firewall-restrictions, which may take effect if you scrape your target every minute.
Try to change your configuration or reduce your scrape interval for example to every 5 minutes.I have a feature request. Where can I send my request?
Please send feature requests via the Service Desk to us. Thank you very much for your contribution.
My Prometheus sample limit is exceeded, the metrics push endpoint responses a HTTP/429. What can I do?
Each service has a Metric-Sample Limit per Minute. If you have exceeded your limit, you can upgrade to a higher plan.
You are missing metrics being written by remote-write and you're getting Thanos/Prometheus errors like "Error on series with out-of-order labels".
The combination of new Thanos/Prometheus and old Telegraf agent leads to this error message. Please update your telegraph agent. The error will disappear.