August 1st 2011

Monitoring Redis? Check last_save_time

As Redis writes to disk asychronously, it is not sufficient to monitor whether it is still listening to TCP/IP connections - you must check whether it is managing to persist the in-memory data to disk.

However, as it is not currently possible to deterministically monitor the status of these background saves, we are limited to checking that the number of seconds since the last save has not exceeded some specific number. This is possible by using last_save_time which is available via the INFO command:

$ redis-cli info | grep last_save_time
last_save_time:1312045364

Checking for this has already saved me twice: once when my vm.overcommit_memory = 1 change was accidentally rolled back (see "Background saving is failing with a fork() error" in the Redis FAQ), and again when the machine simply did not have enough disk space.

To monitor this with Nagios:

#!/usr/bin/env python

import sys
import time
import redis

def main(host, warning, critical):
    try:
        client = redis.Redis(host=host)
        last_save = time.time() - client.info()['last_save_time']
    except Exception, e:
        print "CRITICAL: %s" % e
        return 2

    ret = 0
    state = 'OK'

    for limit, state_, ret_ in (
        (int(warning), 'WARNING', 1),
        (int(critical), 'CRITICAL', 2),
    ):
        if last_save > limit:
            ret = ret_
            state = state_

    print "%s: Last dump was %d seconds ago" % (state, last_save)

    return ret

if __name__ == '__main__':
    sys.exit(main(*sys.argv[1:]))

The following invokation will trigger a warning when we haven't saved in an hour and become critical if we haven't saved within 24 hours. (If we cannot connect to the server we are immediately critical, so we avoid an extra check just for this.)

$ check_redis 127.0.0.1 3600 86400

This setup works well, although we are now monitoring a non-deterministic heuristic. To remedy this, Redis could expose background save failures in the INFO output (last_bgsave_status=fail, perhaps?). However, the treatment of previous contributions puts me off sending further patches.




You can subscribe to new posts via email or RSS.