Observability metrics

Last updated on Feb 18, 2026

This document describes the Prometheus metrics exported by PostgreSQL (via pg_exporter) for the Managed Database Service. All metrics include the label node_type="primary" since the service is configured with Patroni for high availability.

Database Size

`pg_database_size_bytes`

Database file size in bytes (Type: Gauge)

This metric shows the current size of each database on the server and is important for capacity planning and growth trends.

Locks

`pg_locks_count`

Number of locks by mode and database (Type: Gauge)

Lock Modes

Lock Mode	Usage	Conflicts With
`AccessShareLock`	SELECT queries	AccessExclusiveLock
`RowShareLock`	SELECT FOR UPDATE/SHARE	Exclusive, ShareRowExclusive, AccessExclusive
`RowExclusiveLock`	INSERT, UPDATE, DELETE	Share, ShareRowExclusive, Exclusive, AccessExclusive
`ShareUpdateExclusiveLock`	VACUUM, CREATE INDEX CONCURRENTLY	ShareUpdateExclusive and higher
`ShareLock`	CREATE INDEX	RowExclusive and higher
`ShareRowExclusiveLock`	Rarely used	RowExclusive and higher
`ExclusiveLock`	Blocks all concurrent access	RowExclusive and higher
`AccessExclusiveLock`	ALTER TABLE, DROP TABLE, TRUNCATE	All modes

High lock counts can indicate contention and performance issues. Under normal conditions, most locks should be at 0.

Connections

`pg_settings_max_connections`

Maximum number of concurrent connections to the database server (Type: Gauge)

Label	Description
`node_type`	Type of database node

The system uses up to 15 connections for internal essential processes such as backup, monitoring, etc. These connections are counted against the max_connections limit. When approaching this limit, connection pooling should be considered.

`pg_stat_activity_count`

Number of connections grouped by various attributes (Type: Gauge)

Label	Description	Possible Values
`datname`	Name of the database	`nextcloud`, `wordpress`, `crm`
`state`	Connection state	See table below
`application_name`	Name of client application	`Patroni heartbeat`, `Patroni restapi`, `pg_exporter`
`backend_type`	Type of backend process	`client backend`
`usename`	Database user	`api`, `2178329`, `christian.walther`
`wait_event`	Event being waited for	`ClientRead`
`wait_event_type`	Type of wait event	`Client`
`node_type`	Type of database node	`primary`

Connection States

State	Meaning
`active`	Query is currently executing
`idle`	Connection open but no query active
`idle in transaction`	Connection in transaction but no current query
`idle in transaction (aborted)`	Transaction failed, waiting for ROLLBACK
`fastpath function call`	Client executing fast-path function
`disabled`	Tracking disabled for this connection

idle in transaction for extended periods indicates application issues. Many idle connections show inefficient connection management.

Database Statistics

`pg_stat_database_blks_hit`

Number of disk block accesses found already in the buffer cache (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

The cache hit ratio is calculated as blks_hit / (blks_hit + blks_read). The goal is a cache hit ratio of >99% for good performance.

`pg_stat_database_blks_read`

Number of disk blocks that had to be read from disk (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

Low values relative to blks_hit are optimal. High values indicate insufficient shared buffers or cold cache.

`pg_stat_database_conflicts`

Number of queries canceled due to conflicts with recovery (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

This metric is mainly relevant on standby/replica servers. A value of 0 is normal for primary nodes.

`pg_stat_database_temp_bytes`

Total amount of data written to temporary files (in bytes) (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

High values indicate queries that require more RAM than work_mem allows. Frequent temp file usage should be addressed by increasing work_mem or optimizing queries.

`pg_stat_database_tup_deleted`

Number of rows deleted by queries (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

Shows DELETE activity in the database. High delete rates require regular VACUUM operations.

`pg_stat_database_tup_fetched`

Number of rows fetched by queries (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

Measures the number of actually returned rows. High values show intensive read activity.

`pg_stat_database_tup_inserted`

Number of rows inserted by queries (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

Shows INSERT activity in the database.

`pg_stat_database_tup_updated`

Number of rows updated by queries (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

Updates create new row versions (MVCC), so VACUUM is important. High update rates can lead to table bloat.

`pg_stat_database_xact_commit`

Number of successfully committed transactions (Type: Counter)

Label	Description
`datname`	Name of the database
`node_type`	Type of database node

Shows the number of successful transactions. High values indicate intensive database activity.

Service Availability

`pg_up`

Indicates whether the last scrape successfully connected to the server (Type: Gauge)

Label	Description	Values
`node_type`	Type of database node	`1` = connection successful, `0` = connection failed

This is the most important metric for alerting. When the value is 0, all other metrics are unavailable.

Recommended Alerts

Alert	Condition	Severity
PostgreSQL Down	`pg_up == 0`	Critical
Connection Limit	`pg_stat_activity_count / pg_settings_max_connections > 0.8`	Warning
Connection Limit Critical	`pg_stat_activity_count / pg_settings_max_connections > 0.9`	Critical
Low Cache Hit Ratio	`rate(pg_stat_database_blks_hit) / (rate(pg_stat_database_blks_hit) + rate(pg_stat_database_blks_read)) < 0.95`	Warning
Temp File Usage	`rate(pg_stat_database_temp_bytes[5m]) > 0`	Warning
Too Many Locks	`pg_locks_count > 100`	Warning
Long Running Transactions	`pg_stat_activity_count{state="idle in transaction"} > 0` for > 5min	Warning

Monitoring Best Practices

Monitor cache efficiency: The ratio blks_hit / (blks_hit + blks_read) should consistently be >99%.
Connection pooling: When approaching max_connections, a connection pool library/application (e.g., pgBouncer) should be deployed.
Temp file usage: Should remain at 0. If not, increase work_mem or optimize queries.
Lock monitoring: Regular locks can indicate design issues.
Idle in transaction: These connections should not persist longer than a few seconds.

Notes

All metrics are provided via pg_exporter and collected by Prometheus.
The service runs with Patroni for high availability, hence the label node_type="primary".
The listed metrics form the basis for monitoring and alerting. Additional metrics for replication, background writer, and WAL archiving may be available depending on the setup.