GeekGuy

Jan 254 min

Monitoring MongoDB with Telegraf and Prometheus

If you want to find a suitable solution for monitoring MongoDB that help you observe most of important MongoDB metrics.

In this article, we will introduce the combination of Telegraf and Prometheus.


TL;DR:

  • Install Prometheus

  • Install Telegraf

  • Configure Telegraf to monitor MongoDB

  • Configure Prometheus to scrape MongoDB Metrics


First of all, you need to install Prometheus and Telegraf.

Install Prometheus

$ sudo su -
 
# useradd --no-create-home --shell /bin/false prome
 
# mkdir /etc/prometheus
 
# mkdir /var/lib/prometheus
 
# wget https://github.com/prometheus/prometheus/releases/download/v2.28.1/prometheus-2.28.1.linux-amd64.tar.gz
 
# tar -xzvf prometheus-2.28.1.linux-amd64.tar.gz
 
# cp prometheus-2.28.1.linux-amd64/prometheus /usr/local/bin/
 
# cp prometheus-2.28.1.linux-amd64/promtool /usr/local/bin/
 
# chown prome:prome /usr/local/bin/prometheus
 
# chown prome:prome /usr/local/bin/promtool
 
# cp -r prometheus-2.28.1.linux-amd64/consoles /etc/prometheus
 
# cp -r prometheus-2.28.1.linux-amd64/console_libraries /etc/prometheus
 
# chown -R prome:prome /etc/prometheus/consoles
 
# chown -R prome:prome /etc/prometheus/console_libraries
 
# vim /etc/prometheus/prometheus.yml
 

 
global:
 
scrape_interval: 15s
 
scrape_configs:
 
- job_name: 'prometheus'
 
scrape_interval: 5s
 
static_configs:
 
- targets: ['localhost:9090']
 

 
# vim /etc/systemd/system/prometheus.service
 

 
[Unit]
 
Description=Prometheus
 
Wants=network-online.target
 
After=network-online.target
 
[Service]
 
User=prome
 
Group=prome
 
Type=simple
 
ExecStart=/usr/local/bin/prometheus \
 
--config.file /etc/prometheus/prometheus.yml \
 
--storage.tsdb.path /var/lib/prometheus/ \
 
--web.console.templates=/etc/prometheus/consoles \
 
--web.console.libraries=/etc/prometheus/console_libraries
 
[Install]
 

 
# systemctl daemon-reload
 
# systemctl enable prometheus
 
# systemctl status prometheus
 

 
prometheus.service - Prometheus
 
Loaded: loaded (/etc/systemd/system/prometheus.service; disabled; vendor preset: enabled)
 
Active: active (running) since Thu 2021-07-15 22:31:10 UTC; 3s ago
 
Process: 3949 ExecStart=/usr/local/bin/prometheus --config.file /etc/prometheus>
 
Main PID: 3949 (prometheus)
 
Tasks: 7
 
Memory: 13.8M
 
CPU: 470ms
 
CGroup: /system.slice/prometheus.service
 

 
# systemctl start prometheus

Access to UI with IP & port 9090

Install Telegraf

After installing Prometheus, next we need to install Telegraf

Ubuntu

# wget -qO- https://repos.influxdata.com/influxdb.key | sudo apt-key add -
 
# source /etc/lsb-release
 
# echo "deb https://repos.influxdata.com/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable" | sudo tee /etc/apt/sources.list.d/influxdb.list
 
# apt-get update && sudo apt-get install telegraf
 
# service telegraf start

CentOS

$ cat <<EOF | sudo tee /etc/yum.repos.d/influxdb.repo
 
[influxdb]
 
name = InfluxDB Repository - RHEL \$releasever
 
baseurl = https://repos.influxdata.com/rhel/\$releasever/\$basearch/stable
 
enabled = 1
 
gpgcheck = 1
 
gpgkey = https://repos.influxdata.com/influxdb.key
 
EOF
 
$ sudo yum install telegraf
 
$ service telegraf start

Windows

Download ZIP file from InfluxData downloads page.

Extract downloaded ZIP file to C:\Program Files\InfluxData\Telegraf.

Open CMD and run:

> cd C:\Program Files\InfluxData\Telegraf
 
> .\telegraf.exe -config <path_to_telegraf.conf>

Or Install as Windows service:

> cd C:\Program Files\InfluxData\Telegraf
 
> .\telegraf.exe --service install
 
> .\telegraf.exe --service start

Configure Telegraf to monitor MongoDB

First you need to identify which IP and port your MongoDB you need to monitor is running

E.g: MongoDB is running on 10.10.0.4 port 27017

Modify /etc/telegraf/telegraf.d/mongodb.conf

Create /etc/telegraf/telegraf.d/mongodb.conf and modify it:


 
# vim /etc/telegraf/telegraf.d/mongodb.conf
 

 
[[inputs.mongodb]]
 
servers = [ "mongodb://10.10.0.4:27017" ]
 
gather_perdb_stats = true
 
gather_col_stats = true
 
interval = "10s"
 
[inputs.mongodb.ssl]
 
enabled = true
 
[inputs.mongodb.tags] # add any tag you want
 
host = "mongodb.local"
 
hostname = "mongodb.local:27017"
 
version = "4.4.0"
 
service = "mongodb"
 
[[outputs.prometheus_client]]
 
listen = ":9273" # Prometheus Exporter port
 
collectors_exclude = ["gocollector", "process"]
 
[outputs.prometheus_client.tagpass]
 
host = ["mongodb.local" ]

We have just configure Telegraf to read MongoDB metrics and expose to port 9273 for Prometheus to scrape.

The metrics we have just got contain:

  • mongodb

    • tags:

      • hostname

      • node_type

      • rs_name

    • fields:

      • active_reads (integer)

      • active_writes (integer)

      • aggregate_command_failed (integer)

      • aggregate_command_total (integer)

      • assert_msg (integer)

      • assert_regular (integer)

      • assert_rollovers (integer)

      • assert_user (integer)

      • assert_warning (integer)

      • available_reads (integer)

      • available_writes (integer)

      • commands (integer)

      • connections_available (integer)

      • connections_current (integer)

      • connections_total_created (integer)

      • count_command_failed (integer)

      • count_command_total (integer)

      • cursor_no_timeout_count (integer)

      • cursor_pinned_count (integer)

      • cursor_timed_out_count (integer)

      • cursor_total_count (integer)

      • delete_command_failed (integer)

      • delete_command_total (integer)

      • deletes (integer)

      • distinct_command_failed (integer)

      • distinct_command_total (integer)

      • document_deleted (integer)

      • document_inserted (integer)

      • document_returned (integer)

      • document_updated (integer)

      • find_and_modify_command_failed (integer)

      • find_and_modify_command_total (integer)

      • find_command_failed (integer)

      • find_command_total (integer)

      • flushes (integer)

      • flushes_total_time_ns (integer)

      • get_more_command_failed (integer)

      • get_more_command_total (integer)

      • getmores (integer)

      • insert_command_failed (integer)

      • insert_command_total (integer)

      • inserts (integer)

      • jumbo_chunks (integer)

      • latency_commands_count (integer)

      • latency_commands (integer)

      • latency_reads_count (integer)

      • latency_reads (integer)

      • latency_writes_count (integer)

      • latency_writes (integer)

      • member_status (string)

      • net_in_bytes_count (integer)

      • net_out_bytes_count (integer)

      • open_connections (integer)

      • operation_scan_and_order (integer)

      • operation_write_conflicts (integer)

      • page_faults (integer)

      • percent_cache_dirty (float)

      • percent_cache_used (float)

      • queries (integer)

      • queued_reads (integer)

      • queued_writes (integer)

      • repl_apply_batches_num (integer)

      • repl_apply_batches_total_millis (integer)

      • repl_apply_ops (integer)

      • repl_buffer_count (integer)

      • repl_buffer_size_bytes (integer)

      • repl_commands (integer)

      • repl_deletes (integer)

      • repl_executor_pool_in_progress_count (integer)

      • repl_executor_queues_network_in_progress (integer)

      • repl_executor_queues_sleepers (integer)

      • repl_executor_unsignaled_events (integer)

      • repl_getmores (integer)

      • repl_inserts (integer)

      • repl_lag (integer)

      • repl_network_bytes (integer)

      • repl_network_getmores_num (integer)

      • repl_network_getmores_total_millis (integer)

      • repl_network_ops (integer)

      • repl_queries (integer)

      • repl_updates (integer)

      • repl_oplog_window_sec (integer)

      • repl_state (integer)

      • resident_megabytes (integer)

      • state (string)

      • storage_freelist_search_bucket_exhausted (integer)

      • storage_freelist_search_requests (integer)

      • storage_freelist_search_scanned (integer)

      • tcmalloc_central_cache_free_bytes (integer)

      • tcmalloc_current_allocated_bytes (integer)

      • tcmalloc_current_total_thread_cache_bytes (integer)

      • tcmalloc_heap_size (integer)

      • tcmalloc_max_total_thread_cache_bytes (integer)

      • tcmalloc_pageheap_commit_count (integer)

      • tcmalloc_pageheap_committed_bytes (integer)

      • tcmalloc_pageheap_decommit_count (integer)

      • tcmalloc_pageheap_free_bytes (integer)

      • tcmalloc_pageheap_reserve_count (integer)

      • tcmalloc_pageheap_scavenge_count (integer)

      • tcmalloc_pageheap_total_commit_bytes (integer)

      • tcmalloc_pageheap_total_decommit_bytes (integer)

      • tcmalloc_pageheap_total_reserve_bytes (integer)

      • tcmalloc_pageheap_unmapped_bytes (integer)

      • tcmalloc_spinlock_total_delay_ns (integer)

      • tcmalloc_thread_cache_free_bytes (integer)

      • tcmalloc_total_free_bytes (integer)

      • tcmalloc_transfer_cache_free_bytes (integer)

      • total_available (integer)

      • total_created (integer)

      • total_docs_scanned (integer)

      • total_in_use (integer)

      • total_keys_scanned (integer)

      • total_refreshing (integer)

      • total_tickets_reads (integer)

      • total_tickets_writes (integer)

      • ttl_deletes (integer)

      • ttl_passes (integer)

      • update_command_failed (integer)

      • update_command_total (integer)

      • updates (integer)

      • uptime_ns (integer)

      • version (string)

      • vsize_megabytes (integer)

      • wtcache_app_threads_page_read_count (integer)

      • wtcache_app_threads_page_read_time (integer)

      • wtcache_app_threads_page_write_count (integer)

      • wtcache_bytes_read_into (integer)

      • wtcache_bytes_written_from (integer)

      • wtcache_pages_read_into (integer)

      • wtcache_pages_requested_from (integer)

      • wtcache_current_bytes (integer)

      • wtcache_max_bytes_configured (integer)

      • wtcache_internal_pages_evicted (integer)

      • wtcache_modified_pages_evicted (integer)

      • wtcache_unmodified_pages_evicted (integer)

      • wtcache_pages_evicted_by_app_thread (integer)

      • wtcache_pages_queued_for_eviction (integer)

      • wtcache_server_evicting_pages (integer)

      • wtcache_tracked_dirty_bytes (integer)

      • wtcache_worker_thread_evictingpages (integer)

      • commands_per_sec (integer, deprecated in 1.10; use commands))

      • cursor_no_timeout (integer, opened/sec, deprecated in 1.10; use cursor_no_timeout_count))

      • cursor_pinned (integer, opened/sec, deprecated in 1.10; use cursor_pinned_count))

      • cursor_timed_out (integer, opened/sec, deprecated in 1.10; use cursor_timed_out_count))

      • cursor_total (integer, opened/sec, deprecated in 1.10; use cursor_total_count))

      • deletes_per_sec (integer, deprecated in 1.10; use deletes))

      • flushes_per_sec (integer, deprecated in 1.10; use flushes))

      • getmores_per_sec (integer, deprecated in 1.10; use getmores))

      • inserts_per_sec (integer, deprecated in 1.10; use inserts))

      • net_in_bytes (integer, bytes/sec, deprecated in 1.10; use net_out_bytes_count))

      • net_out_bytes (integer, bytes/sec, deprecated in 1.10; use net_out_bytes_count))

      • queries_per_sec (integer, deprecated in 1.10; use queries))

      • repl_commands_per_sec (integer, deprecated in 1.10; use repl_commands))

      • repl_deletes_per_sec (integer, deprecated in 1.10; use repl_deletes)

      • repl_getmores_per_sec (integer, deprecated in 1.10; use repl_getmores)

      • repl_inserts_per_sec (integer, deprecated in 1.10; use repl_inserts))

      • repl_queries_per_sec (integer, deprecated in 1.10; use repl_queries))

      • repl_updates_per_sec (integer, deprecated in 1.10; use repl_updates))

      • ttl_deletes_per_sec (integer, deprecated in 1.10; use ttl_deletes))

      • ttl_passes_per_sec (integer, deprecated in 1.10; use ttl_passes))

      • updates_per_sec (integer, deprecated in 1.10; use updates))

  • mongodb_db_stats

    • tags:

      • db_name

      • hostname

    • fields:

      • avg_obj_size (float)

      • collections (integer)

      • data_size (integer)

      • index_size (integer)

      • indexes (integer)

      • num_extents (integer)

      • objects (integer)

      • ok (integer)

      • storage_size (integer)

      • type (string)

  • mongodb_col_stats

    • tags:

      • hostname

      • collection

      • db_name

    • fields:

      • size (integer)

      • avg_obj_size (integer)

      • storage_size (integer)

      • total_index_size (integer)

      • ok (integer)

      • count (integer)

      • type (string)

  • mongodb_shard_stats

    • tags:

      • hostname

    • fields:

      • in_use (integer)

      • available (integer)

      • created (integer)

      • refreshing (integer)

  • mongodb_top_stats

    • tags:

      • collection

    • fields:

      • total_time (integer)

      • total_count (integer)

      • read_lock_time (integer)

      • read_lock_count (integer)

      • write_lock_time (integer)

      • write_lock_count (integer)

      • queries_time (integer)

      • queries_count (integer)

      • get_more_time (integer)

      • get_more_count (integer)

      • insert_time (integer)

      • insert_count (integer)

      • update_time (integer)

      • update_count (integer)

      • remove_time (integer)

      • remove_count (integer)

      • commands_time (integer)

      • commands_count (integer)

Restart Telegraf to apply new configuration:

$ sudo systemctl restart telegraf

Configure Prometheus to scrape MongoDB Metrics

We will configure Prometheus to scrape MongoDB Metrics from exposed port 9273 by Telegraf Output Plugin above.

$ sudo vim /etc/prometheus/prometheus.yml
 

 
global:
 
scrape_interval: 10s
 
scrape_configs:
 
- job_name: 'mongodb'
 
scrape_interval: 10s
 
scrape_timeout: 5s
 
metrics_path: "/metrics"
 
static_configs:
 
- targets: ['localhost:9273']
 
labels:
 
service: mongodb
 
metric_relabel_configs:
 
- source_labels: [__name__]
 
regex: "mongodb_(.+)"
 
action: keep

Restart Prometheus to apply new configuration

$ sudo systemctl restart prometheus

Access Prometheus UI to get metrics. The metrics will be like:

mongodb_active_reads{exported_service="mongodb",host="mongodb.local",hostname="mongodb.local:27017",instance="localhost:9273",job="mongodb",member_status="SEC",node_type="SEC",rs_name="atlas-dofij-shard-0",service="mongodb",version="4.2.15"} 1
 
mongodb_active_writes{exported_service="mongodb",host="mongodb.local",hostname="mongodb.local:27017",instance="localhost:9273",job="mongodb",service="mongodb",version="4.2.15"} 0

This solution support both standalone MongoDB installation & Mongo Atlas.

    360
    0