RRDtool – quick tutorial

rrdtool-3dlogo

RRDtool is both a database and a graphing tool that is well-suited to visualize time-varying data like temperature, pressure and all kinds of telemetry.

There are many tutorials available, but they dig too much into the details (like the official one). Since I use RRDtool only for storing and graphing telemetry data I will explain how to do it the easy way.

I use RRDtool for graphing many sources of data, like my small solar system:

rrdtool_ex1

but I will start with an easier example:
rrdtool_ex2

Creating a database

RRDtool stores all data in a round-robin (hence the name) database. In simple terms – it is a circular log. The database is just a single file, all storage is allocated at the moment of creation, the file always has fixed size and new data overwrites old data.

The command (substitute $FILENAME for a real name):

The magic parts mean:

  • -s 300 – how often new data will be added to the database (here: 5 minutes)
  • --no-overwrite – in case you have many files in a single directory and you want to avoid destroying one of them
  • DS:temperature:GAUGE:330:-60:120 – the datasource, basically the data you want put in
    • A single file can contain several datasources and you can perform some math on them.
    • The name of this datasource is temperature
    • Its type is GAUGE (simply the value, not a counter or a derivative etc.)
    • 330 – heartbeat. If data is not input within that time then RRDtool marks that spot as unknown.
    • -60 – minimum value (in this case the lowest temperature in degrees)
    • 120 – maximum value (in this case the highest temperature in degrees)
  • RRA:AVERAGE:0.5:1:4730400 – the Round-Robin Archive. A complicated name for the “output” of the database that can be viewed or graphed (or, strictly speaking, the real contents of the file).
    • AVERAGE – the easiest type to work with. If there are more samples coming in than specified in the datasource specification (like every 1 minute instead of 5 minutes) they will simply be averaged out, which is a good choice for “real world environment” telemetry (other available are MIN, MAX, LAST – only the last sample is valid).
    • 0.5 – xfiles factor (another great name 😉 ), it specifies how many samples can be missing to make a valid output point. Example: datasource is feed every minute, the output is an average of 10 samples, RRDtool will produce an output point if at least 5 samples have been put in.
    • 1 – steps. Specifies how many datasource “points” should make an output point. In this case it is 1:1, each sample put in is stored.
    • 4730400 – number of rows. Specifies how many “processed” samples should be kept.

In this case I create an archive that will store temperature samples fed at least every 5 minutes. If there are more samples coming in, they will be averaged out. The database will store 4730400 five-minute averaged intervals, which is 45 years worth of data.

It is not the most efficient configuration, but the file has the size of 36MB (which is pretty much nothing today). The idea behind a separate data source and output is that you can drastically reduce file size if you want, for example, store last week of telemetry with 1 minute resolution and 1 year with 1 hour resolution. RRDtool does all that processing automatically. You can also combine several datasources eg. measure two temperatures and store only the difference, add a electricity counter (type COUNTER – only the increment is stored) source and compute the temperature difference per electricity consumed with some advanced statistics. RRDtool can graph data from many files together, so I use a separate file for each temperature sensor.

In my simple scenario no specific optimization is necessary. Storage and processing power is virtually free. A classical database could also handle the task, but RRDtool is still a better choice for Banana Pi or Raspberry Pi, than a full installation of MySQL or Postgres.

Adding data

Adding data is trivial:

No extra options are required if there is only one data source in that file. 25.26 is the temperature.

The PHP server requires slight modification. First is a list of mappings between sensor id and files. ADC scaling factor also has to be defined, as I want to store the result in Volts, not ADC codes.

The second difference lays in the main loop:

I check if the reported temperature is different than 85,0 degrees, as it is the value returned when the sensor is not initialized or faulty. There is no point in storing bad data (it would also make problems with automatic scaling of the graphs).

This script is secure enough. It does not validate the sensors themselves, but it does not create security holes in the system it is running on. File names are statically mapped, so no other files can be manipulated and the value passed to RRDtool is escaped via escapeshellarg, so if a rougue “sensor” supplies a temperature of 25.1; rm -Rf / it will not do any damage.

Making graphs

I use simple shell scripts to make graphs, like this one:

Common names such as 12h or 30d can be supplied to this script (the long if-else statement). If the period is long, line width is decreased (it can be LINE1, LINE2, LINE3 etc.). Two data series are graphed (named temperaturefrom no1-temperature.rrd and no2-temperature.rrd). Colors can be specified in HTML hex format. The last arguments are the labels.

I use a one script per data series convention (with different time intervals to graph). The script that graphs battery voltage is almost identical. Both can be added to crontab to automatically update images.

Backup

RRDtool files are hardly compatible between versions and different systems. 32-bit Linux and 64-bit Linux will have different data formats (because they are heavily optimized to the last byte), so copying .rrd files between systems will not work. The best way to transfer data between systems is to do an XML export on the source one and an import. XML files (of a 45-year archive) can have gigantic sizes, so I use this handy script for compressing backups:

Good point here is that the huge XML file is compressed on the fly and does not use much of disk space. When the backup is complete I copy the XZ archives to a different machine.