Zum Inhalt springen

Monitor

Diese Seite ist noch nicht in deiner Sprache verfügbar. Englische Seite aufrufen

Monitoring and observability in combination with your database is a crucial part of providing a good customer experience and to prove your own SLAs towards your customers. With Observability, STACKIT offers a managed service which covers observabiliy and monitoring. Read the chapter about Observability, to learn how to configure the managed service to your needs.

This article follows the paradigm, that you use monitoring with thresholds on certain metrics to recognize situations which need your attention and Observability to analyze how to take action.

PostgreSQL Flex offers several metrics directly in the Portal. This is an additional preconfigured service and runs in parallel to an optional self-configured Observability instance.

Follow the steps to access the metrics:

  1. In the sidebar, click on PostgreSQL Flex.

  2. Click on the instance for which you want to access the metrics.

  3. In the sidebar, click on Metrics.

The panel shows the current utilization of the most important resources of all nodes. Consult Metrics of PostgreSQL Flex to get an idea of the meaning of the metrics.

Use Observability to analyze and solve potential problems with your PostgreSQL Flex service

Section titled “Use Observability to analyze and solve potential problems with your PostgreSQL Flex service”

Regardless if you use STACKIT’s internal Observability solution or an external tool, you need to configure scraping manually. Add a scraper to your Prometheus instance, then continue by adding a dashboard to Grafana to visualize the metrics you are interested in. To get a list of available metrics, read Metrics of PostgreSQL Flex.

The metrics and thresholds are important tools to maintain your instance. Besides proving to your customer that your SLAs are met, you get important information for sizing your instance. It is good practice to set the alarm thresholds between 60% and 80% depending on your risk affinity and reaction time.

Before allocating more resources, try to optimize your database and queries first. There are several patterns in the metrics that give you hints about optimization potential. Read PostgreSQL’s Performance tips on how to recognize these patterns and on how to mitigate the causes.

If you are sure that you need more resources, utilize the metrics as well. For every instance type (Single and Replica) there are three parameters that control the allocated resources: instance type, performance class and flavor.

If the metric Disk Util % grows over 80%, it is best practice to increase the performance size.

When you analyze performance problems with storage, you need to distinguish between two parameters: IOPS and disk bandwidth. The first is more important for internal database tasks and handling many small queries, the latter is important when serving large bulk files with your queries.

If the metric Disk IOPS reaches 80% of the IOPS of your performance class plan, it is best practice to clone to an instance with a higher performance class. Read Flavors and performance classes of PostgreSQL Flex to look up which plan has how much IOPS.

If the metric Disk Util % permanently reaches 60%, consider upgrading the performance class.

PostgreSQL heavily relies on memory. As a rule of thumb, your RAM should be as big as 60% of your PostgreSQL’s DB Storage. With the flavor parameter, you can allocate more memory and CPU resources to your instance.

When interpretating memory metrics, you can rely on the Memory parameter. If it permenently lies above 80%, consider upgrading the flavor.

For analyzing the need on CPUs, monitor the CPU parameter. If it lies permanently over 80%, consider upgrading the flavor.