So you heard about VictoriaMetrics and its claims for increased performance at lower resource usage and you want to see for yourself. After all, who believes everything the authors say about their code? :)
For a meaningful comparison between VictoriaMetrics and Prometheus, you first need to get the same amount of metrics in VM. Prometheus has been in your stack for months and say has 6 months of metrics. How do you get that data into VM?
Remote Write
The standard way to add VictoriaMetrics into your Prometheus stack is to
configure Prometheus remote_write
. remote_write
is all well and
good, there are several tuning parameters on Proemtheus end to ensure
it can keep up with the number of metrics you’re ingesting.
However, remote_write
only reads from the WAL, which means that once
you enable it, VictoriaMetrics will receive last 2 hours worth of
metrics and then receive them close to real-time going forward.
If your retention period is 6 months in Prometheus, do you wait for 6 months to fill VictoriaMetrics and then run your tests? That’s not a feasible option.
Bulk import with vmctl
VictoriaMetrics authors have written a tool that can import Prometheus
(and InfluxDB) data into VictoriaMetrics. To make such imports even
faster than standard ingestion via remote_write
allows,
VictoriaMetrics even has a bulk-import API endpoint and that’s exactly
what vmctl
uses.
Let’s build it and see how it works. The sources are on
GitHub. Clone the repo and
build it with make build
. My test nodes are running FreeBSD, so I’m
building a FreeBSD executable:
cd $GOPATH/src/github.com/vmctl
GOOS=freebsd go build -mod=vendor -o vmctl
All done. Let’s see how to use it:
./vmctl prometheus -h
NAME:
vmctl prometheus - Migrate timeseries from Prometheus
USAGE:
vmctl prometheus [command options] [arguments...]
OPTIONS:
--prom-snapshot value Path to Prometheus snapshot. Pls see for details https://www.robustperception.io/taking-snapshots-of-prometheus-data
--prom-concurrency value Number of concurrently running snapshot readers (default: 1)
--prom-filter-time-start value The time filter to select timeseries with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
--prom-filter-time-end value The time filter to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'
--prom-filter-label value Prometheus label name to filter timeseries by. E.g. '__name__' will filter timeseries by name.
--prom-filter-label-value value Prometheus regular expression to filter label from "prom-filter-label" flag. (default: ".*")
--vm-addr value VictoriaMetrics address to perform import requests. Should be the same as --httpListenAddr value for single-node version or VMSelect component. (default: "http://localhost:8428")
--vm-user value VictoriaMetrics username for basic auth [$VM_USERNAME]
--vm-password value VictoriaMetrics password for basic auth [$VM_PASSWORD]
--vm-account-id value Account(tenant) ID - is required for cluster VM. (default: -1)
--vm-concurrency value Number of workers concurrently performing import requests to VM (default: 2)
--vm-compress Whether to apply gzip compression to import requests (default: true)
--vm-batch-size value How many datapoints importer collects before sending the import request to VM (default: 200000)
--help, -h show help (default: false)
2020/04/03 17:56:52 Total time: 509.024µs
So what we need is a snapshot of the Prometheus TSDB. The help output conveniently includes a link to Robust Perception blog about Prometheus snapshots if you’ve not used them before. In a nutshell, Prometheus snapshot is simply a consistent copy of metrics at a given point in time.
Let’s take a snapshot:
curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot
{"status":"success","data":{"name":"20200404T012938Z-66bb0213b2d7bbe8"}}
This won’t take too long as snapshots use hard links, but of course depends on the size of your Prometheus TSDB and the speed of the IO subsystem on the host. The Prometheus instance we’re testing with here has 4 month retention & TSDB is nearly 35GB; the snapshot took several seconds to complete.
Now let’s import this snapshot. Note that the server we’re using here has spinning disks for storage and that’s the ultimate bottleneck for the total import duration, YMWV.
Also, if you use Grafana, you should install the official VM dashboard so you can have an idea of how well it’s performing during import.
./vmctl prometheus --prom-snapshot /var/db/prometheus/snapshots/20200404T012938Z-66bb0213b2d7bbe8/ --vm-addr http://localhost:8428
Prometheus import mode
Prometheus snapshot stats:
blocks found: 1413;
blocks skipped: 0;
min time: 1575583200000 (2019-12-05T22:00:00Z);
max time: 1585963778392 (2020-04-04T01:29:38Z);
samples: 11001521842;
series: 15405068.
Filter is not taken into account for series and samples numbers.
Found 1413 blocks to import. Continue? [Y/n] y
1413 / 1413 [---------------------------------------------------------------------------------------------------] 100.00% 0 p/s
2020/04/04 06:41:13 Import finished!
2020/04/04 06:41:13 VictoriaMetrics importer stats:
time spent while waiting: 8h6m10.020967102s;
time spent while importing: 2h16m21.181395204s;
total datapoints: 11001521842;
datapoints/s: 1344735.11;
total bytes: 223.4 GB;
bytes/s: 27.3 MB;
import requests: 109817;
import requests retries: 0;
2020/04/04 06:41:13 Total time: 5h11m16.392353968s
Wow, 5 hours is a very long time, but since this is just a test, then we
don’t particularly care. If you have SSD-based storage, you should
experiment with tuning read and write concurrency settings in vmctl
,
you’ll get much higher import throughput.
Specifically, you want your import to take less than 2 hours, so that
more recent metrics can be backfilled via remote_write
to keep VM
nearly in-sync with Prometheus from then on.
If you’re importing a lot of data and it still takes more than 2 hours
you can simply take another snapshot and import that. vmctl
can be
told a timestamp from which to start the import with
--prom-filter-time-start
option, which effectively means it’ll do an
incremental import (even though the snapshot generated is complete).
The value you give here should match the max time
timestamp (in
rfc3339 format) from the
previous import output. For example, the samples in our original
snapshot ended at 2020-04-04T01:29:38Z, so for next import we’d run:
./vmctl prometheus --prom-snapshot /var/db/prometheus/snapshots/$snapshot_id/ --vm-addr http://localhost:8428 --prom-filter-time-start 2020-04-04T01:29:38Z
Summary
Setting up VictoriaMetrics is fairly trivial - you can be fully-provisioned and have an instance full with metrics in a few hours time. Now it’s time to include VM into your metrics read path and compare the numbers.