Monitoring Redis? Check last_save_time

As Redis writes to disk asychronously, it is not sufficient to monitor whether it is still listening to TCP/IP connections - you must check whether it is managing to persist the in-memory data to disk.

However, as it is not currently possible to deterministically monitor the status of these background saves, we are limited to checking that the number of seconds since the last save has not exceeded some specific number. This is possible by using last_save_time which is available via the INFO command:

$ redis-cli info | grep last_save_time
last_save_time:1312045364

Checking for this has already saved me twice: once when my vm.overcommit_memory = 1 change was accidentally rolled back (see "Background saving is failing with a fork() error" in the Redis FAQ), and again when the machine simply did not have enough disk space.

To monitor this with Nagios:

#!/usr/bin/env python

import sys
import time
import redis

def main(host, warning, critical):
    try:
        client = redis.Redis(host=host)
        last_save = time.time() - client.info()['last_save_time']
    except Exception, e:
        print "CRITICAL: %s" % e
        return 2

    ret = 0
    state = 'OK'

    for limit, state_, ret_ in (
        (int(warning), 'WARNING', 1),
        (int(critical), 'CRITICAL', 2),
    ):
        if last_save > limit:
            ret = ret_
            state = state_

    print "%s: Last dump was %d seconds ago" % (state, last_save)

    return ret

if __name__ == '__main__':
    sys.exit(main(*sys.argv[1:]))

The following invokation will trigger a warning when we haven't saved in an hour and become critical if we haven't saved within 24 hours. (If we cannot connect to the server we are immediately critical, so we avoid an extra check just for this.)

$ check_redis 127.0.0.1 3600 86400

This setup works well, although we are now monitoring a non-deterministic heuristic. To remedy this, Redis could expose background save failures in the INFO output (last_bgsave_status=fail, perhaps?). However, the treatment of previous contributions puts me off sending further patches.

Comments (3)

Greg Mitchell

Thanks for the script and the advice, I did find I had to alter it though since my Redis instance was idle and hadn't received any data for a while your script as it reports it as critical unless I test it by inserting a key value to force a background save with the default values for save 900 1.

Feb. 24, 2014, 6:35 p.m. #
Thanks, that's a good point.

Great script however in my instance I had to change "last_save_time" to "rdb_last_save_time". I also had to add support for password as well. (Sure do hope folks are actually securing their instances...). In any event, thank you!

Oct. 28, 2015, 10:02 p.m. #
Interesting. I suppose the rdb_ has grown in more recent versions. Regarding security, I just don't have Redis exposed to the internet whatsoever - either listening on localhost only or in a private VPN - a "dumb" password scheme is a bit unwieldy anyway, especially as you have to share it securely around your infrastructure and makes things like this just that little bit more difficult to deploy.

In regard to sharing secrets, check out "vaultproject.io". It may or may not meet your needs but I feel it's worth checking out.

Nov. 23, 2015, 3:51 p.m. #