Observability metrics of MongoDB Flex

Last updated on Mar 11, 2026

In this article you can learn about the meaning of the observability metrics exported by MongoDB. All metrics are available in OpenMetrics format and can be scraped by time-series databases such as Prometheus, VictoriaMetrics, or Grafana Mimir.

Service Availability

`mongodb_up`

Indicates whether the last scrape was able to reach the database daemon and the latter reported a healthy state. (type: gauge)

Label	Description	Values
`node_type`	Type of database node	`0` = Primary, `1` = Secondary, `2` = Arbiter

This is the most important metric for alerting. When the value is 0, all other metrics are unavailable.

`mongodb_uptimeMillis`

Server uptime in milliseconds (type: counter)

Label	Description
`rs_nm`	Replica set name
`rs_state`	Replica set member state
`node_type`	Type of database node

Can be used to calculate uptime percentages. A reset to 0 indicates a server restart.

Connections

`mongodb_connections`

Number of connections to the server (type: gauge)

Label	Description	Possible Values
`conn_type`	Type of connection	`current`, `available`
`node_type`	Type of database node	`0`, `1`, `2`

High current connection counts indicate misconfigured connection pools or connection leaks.

Memory

`mongodb_mem_resident`

RAM used by the MongoDB process in megabytes (type: gauge)

Label	Description
`node_type`	Type of database node

`mongodb_mem_virtual`

Virtual memory used by the MongoDB process in megabytes (type: gauge)

Label	Description
`node_type`	Type of database node

Virtual memory is typically much larger than resident memory due to memory-mapped files.

Operations

`mongodb_opcounters_total`

Total operations by type since server startup (type: counter)

Label	Description	Possible Values
`type`	Operation type	`insert`, `query`, `update`, `delete`, `getmore`, `command`
`node_type`	Type of database node	`0`, `1`, `2`

Use rate() to calculate operations per second. High rates with high latency indicate performance issues.

Network

`mongodb_network_bytes_total`

Network traffic in bytes (type: counter)

Label	Description	Possible Values
`type`	Traffic direction	`in`, `out`
`node_type`	Type of database node	`0`, `1`, `2`

Hardware Metrics

The following metrics represent system-level resources. All include the label node_type.

CPU

`hardware_system_cpu_io_wait_milliseconds`

Time waiting for I/O operations (type: counter)

Calculate normalized CPU iowait: rate(hardware_system_cpu_io_wait_milliseconds[1m]) / 10 / hardware_platform_num_logical_cpus

`hardware_platform_num_logical_cpus`

Number of logical CPU cores (type: gauge)

Memory

`hardware_system_memory_mem_available_kilobytes`

Available memory in kilobytes (type: gauge)

More useful than mem_free as it includes reclaimable cache memory.

`hardware_system_memory_cached_kilobytes`

In-memory cache for disk files in kilobytes (type: gauge)

High values are normal and indicate efficient RAM usage.

Disk

`hardware_disk_metrics_disk_space_free_bytes` / `hardware_disk_metrics_disk_space_used_bytes`

Disk space in bytes (type: gauge)

Label	Description
`disk_name`	Block device name
`node_type`	Type of database node

`hardware_disk_metrics_read_count` / `hardware_disk_metrics_write_count`

I/O operations processed (type: counter)

Label	Description
`disk_name`	Block device name
`node_type`	Type of database node

Calculate IOPS: rate(hardware_disk_metrics_read_count[30s]) + rate(hardware_disk_metrics_write_count[30s])

`hardware_disk_metrics_read_time_milliseconds` / `hardware_disk_metrics_write_time_milliseconds`

Wait time for I/O requests in milliseconds (type: counter)

Label	Description
`disk_name`	Block device name
`node_type`	Type of database node

Calculate latency: rate(hardware_disk_metrics_read_time_milliseconds[5m]) / rate(hardware_disk_metrics_read_count[5m])

`hardware_disk_metrics_weighted_time_io_milliseconds`

Weighted time doing I/Os - indicates disk queue depth (type: counter)

Label	Description
`disk_name`	Block device name
`node_type`	Type of database node

High values suggest storage system struggles to keep up with I/O demand.

Best Practices

We highly recommend monitoring the following metrics:

Disk IOPS: The Disk IOPS threshold depends on the current IOPS allocation provisioned for the cluster’s tier and storage capacity. It is the sum of hardware_disk_metrics_read_count and hardware_disk_metrics_write_count. Monitor whether disk IOPS approaches the maximum provisioned IOPS and determine whether the cluster can handle future workloads.
Normalized System CPU iowait: This metric indicates the percentage of time the CPU is idle, waiting for IO (input/output) operations to finish, scaled to a range of 0-100% by dividing it by the number of CPU cores. This metric helps identify potential disk bottlenecks. It’s possible that the system is reaching its aggregate disk throughput limits based on available capacity. In this scenario, you might notice IOPS not reaching their full capacity while concurrently observing Normalized System CPU iowait, indicating IO resource exhaustion.
Disk Queue Depth: The Disk Queue Depth metric represents the count of pending I/O operations in the disk queue. It offers visibility into the volume of pending read and write operations awaiting processing by the underlying storage system. A high Disk Queue Depth value can indicate that the storage system is struggling to keep up with the workload, potentially leading to performance issues. However, it’s important to note that what constitutes a “high” value depends on various factors such as your specific workload, hardware setup, and performance expectations. Generally, if the Disk Queue Depth consistently maintains a value exceeding 2-4 times the number of CPU cores on your server, it might suggest an underlying issue. This hints at an excess of pending I/O operations that the storage system might not be adeptly managing.
Disk Latency: In addition to monitoring the two metrics mentioned previously, we recommend creating an alert on Disk read latency on Data Partition and Disk write latency on Data Partition similar to the threshold you have defined for your operation execution time, which depends on the cluster configuration and your specific workload. Note that acceptable disk latency can significantly differ based on factors, such as: your application’s workload, the complexity of your queries, the read and write patterns and the overall performance expectations.

Resources

We offer a template of the most important metrics. You can download and import it in STACKIT Observability: metric_exporter.json.

Observability metrics of MongoDB Flex

Service Availability

mongodb_up

mongodb_uptimeMillis

Connections

mongodb_connections

Memory

mongodb_mem_resident

mongodb_mem_virtual

Operations

mongodb_opcounters_total

Network

mongodb_network_bytes_total

Hardware Metrics

CPU

hardware_system_cpu_io_wait_milliseconds

hardware_platform_num_logical_cpus

Memory

hardware_system_memory_mem_available_kilobytes

hardware_system_memory_cached_kilobytes

Disk

hardware_disk_metrics_disk_space_free_bytes / hardware_disk_metrics_disk_space_used_bytes

hardware_disk_metrics_read_count / hardware_disk_metrics_write_count

hardware_disk_metrics_read_time_milliseconds / hardware_disk_metrics_write_time_milliseconds

hardware_disk_metrics_weighted_time_io_milliseconds

Best Practices

Resources

`mongodb_up`

`mongodb_uptimeMillis`

`mongodb_connections`

`mongodb_mem_resident`

`mongodb_mem_virtual`

`mongodb_opcounters_total`

`mongodb_network_bytes_total`

`hardware_system_cpu_io_wait_milliseconds`

`hardware_platform_num_logical_cpus`

`hardware_system_memory_mem_available_kilobytes`

`hardware_system_memory_cached_kilobytes`

`hardware_disk_metrics_disk_space_free_bytes` / `hardware_disk_metrics_disk_space_used_bytes`

`hardware_disk_metrics_read_count` / `hardware_disk_metrics_write_count`

`hardware_disk_metrics_read_time_milliseconds` / `hardware_disk_metrics_write_time_milliseconds`

`hardware_disk_metrics_weighted_time_io_milliseconds`