Skip to content

Monitoring companion for Nomad periodic jobs and Cron

License

Notifications You must be signed in to change notification settings

blockchainjamie/deadman-check

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deadman Check

Build Status Gem Version Docker Image Docker Image Version

A monitoring companion for Nomad periodic jobs that alerts if periodic isn't running at the expected interval.

The deadman-check has 2 modes:

  1. Run with the Nomad periodic job as an additional task to update a key in Consul with current EPOCH time and required time frequency.

  2. Run as a separate process that will monitor the Consul key's EPOCH time value and alert if that value fails to meet a time frequency threshold that is expected for that job.

Requirements

  • Consul instance or cluster to report to

Alerting Options

screen shot 2017-03-26 at 3 29 28 pm

screen shot 2017-08-04 at 11 39 12 am

Example Usage

Let's say I have a Nomad periodic job that is set to run every 10 minutes. The Nomad configuration looks like this:

job "SilverBulletPeriodic" {
  datacenters = ["dc1"]
  type = "batch"

  periodic {
    cron             = "*/10 * * * * *"
    prohibit_overlap = true
  }

  group "utility" {
    task "SilverBulletPeriodicProcess" {
      driver = "docker"
      config {
        image    = "silverbullet:build_1"
        work_dir = "/utility/silverbullet"
        command  = "blaster"
      }
      resources {
        cpu = 100
        memory = 500
      }
    }
  }
}

To monitor the SilverBulletPeriodicProcess task let's add a deadmad-check task to run post updates to a Consul endpoint (10.0.0.1 for this example)

job "SilverBulletPeriodic" {
  datacenters = ["dc1"]
  type = "batch"

  periodic {
    cron             = "*/10 * * * * *"
    prohibit_overlap = true
  }

  group "silverbullet" {
    task "SilverBulletPeriodicProcess" {
      driver = "docker"
      config {
        image    = "silverbullet:build_1"
        work_dir = "/utility/silverbullet"
        command  = "blaster"
      }
      resources {
        cpu = 100
        memory = 500
      }
    }
    task "DeadmanSetSilverBulletPeriodicProcess" {
      driver = "docker"
      config {
        image    = "sepulworld/deadman-check"
        command  = "key_set"
        args     = [
          "--host",
          "10.0.0.1",
          "--port",
          "8500",
          "--key",
          "deadman/SilverBulletPeriodicProcess",
          "--frequency",
          "700"]
      }
      resources {
        cpu = 100
        memory = 256
      }
    }
  }
}

screen shot 2017-04-23 at 11 14 36 pm

Now the key, deadman/SilverBulletPeriodicProcess, at 10.0.0.1 will be updated with the EPOCH time for each SilverBulletPeriodic job run. If the job hangs or fails to run we will know via the EPOCH time entry going stale.

Next we need a job that will run to monitor this key.

job "DeadmanMonitoring" {
  datacenters = ["dc1"]
  type = "service"

  group "monitor" {
    task "DeadmanMonitorSilverBulletPeriodicProcess" {
      driver = "docker"
      config {
        image    = "sepulworld/deadman-check"
        command  = "switch_monitor"
        args     = [
          "--host",
          "10.0.0.1",
          "--port",
          "8500",
          "--key",
          "deadman/SilverBulletPeriodicProcess",
          "--alert-to-slack",
          "slackroom",
          "--daemon",
          "--daemon-sleep",
          "900"]
      }
      resources {
        cpu = 100
        memory = 256
      }
      env {
        SLACK_API_TOKEN = "YourSlackApiToken"
      }
    }
  }
}

Monitor a Consul key that contains an EPOCH time entry. Send a Slack message if EPOCH age hits given frequency threshold

screen shot 2017-03-26 at 3 29 28 pm

If you have multiple periodic jobs that need to be monitored then use the --key-path argument instead of --key. Be sure to key_set all under the same Consul key path.

screen shot 2017-04-23 at 11 17 29 pm

To monitor the above you would just use the --key-path argument instead of --key and AWS SNS for alerting endpoint

job "DeadmanMonitoring" {
  datacenters = ["dc1"]
  type = "service"

  group "monitor" {
    task "DeadmanMonitorSilverBulletPeriodicProcesses" {
      driver = "docker"
      config {
        image    = "sepulworld/deadman-check"
        command  = "switch_monitor"
        args     = [
          "--host",
          "10.0.0.1",
          "--port",
          "8500",
          "--key-path",
          "deadman/",
          "--alert-to-sns",
          "arn:aws:sns:us-east-1:123412345678:deadman-check",
          "--alert-to-sns-region",
          "us-east-1",
          "--daemon",
          "--daemon-sleep",
          "900"]
      }
      resources {
        cpu = 100
        memory = 256
      }
      env {
        AWS_ACCESS_KEY_ID = "YourAWSKEY"
        AWS_SECRET_ACCESS_KEY = "YourAWSSecret"
      }
    }
  }
}

Non-Nomad Use:

Local system installation

execute:

$ bundle install
$ gem install deadman_check

Install and run deadman-check from Docker

# Optional: If you don't pull explicitly, `docker run` will do it for you
$ docker pull sepulworld/deadman-check

$ alias deadman-check='\
  docker run \
    -it --rm --name=deadman-check \
    sepulworld/deadman-check'

(Depending on how your system is set up, you might have to add sudo in front of the above docker commands or add your user to the docker group).

If you don't do the docker pull, the first time you run deadman-check, the docker run command will automatically pull the sepulworld/deadman-check image on the Docker Hub. Subsequent runs will use a locally cached copy of the image and will not have to download anything.

Alerting Setup

  • Slack alerting requires a SLACK_API_TOKEN environment variable to be set (use Slack Bot integration) (optional)
  • AWS SNS alerting requires appropreiate AWS IAM access to target SNS topic. One of the following can be used for authentication. IAM policy access to publish to the topic will be required
    • ENV['AWS_ACCESS_KEY_ID'] and ENV['AWS_SECRET_ACCESS_KEY']
    • The shared credentials ini file at ~/.aws/credentials (more information)
    • From an instance profile when running on EC2

Usage via Local System Install

$ deadman-check -h
  NAME:

    deadman-check

  DESCRIPTION:

    Monitor a Consul key or key-path that contains an EPOCH time entry and frequency. Send Slack message if EPOCH age is greater than given frequency

  COMMANDS:

    help           Display global or [command] help documentation
    key_set        Update a given Consul key with current EPOCH
    switch_monitor Target a Consul key to monitor

  GLOBAL OPTIONS:

    -h, --help
        Display help documentation

    -v, --version
        Display version information

    -t, --trace
        Display backtrace when an error occurs

Usage for key_set command

$ deadman-check key_set -h

  NAME:

    key_set

  SYNOPSIS:

    deadman-check key_set [options]

  DESCRIPTION:

    key_set will set a consul key that contains the current epoch and time frequency that job should be running at, example key {"epoch":1493010437,"frequency":"300"}

  EXAMPLES:

    # Update a Consul key deadman/myservice, with current EPOCH time
    deadman-check key_set --host 127.0.0.1 --port 8500 --key deadman/myservice --frequency 300

  OPTIONS:

    --host HOST
        IP address or hostname of Consul system

    --port PORT
        port Consul is listening on

    --key KEY
        Consul key to report EPOCH time and frequency for service

    --frequency FREQUENCY
        Frequency at which this key should be updated in seconds

Usage for switch_monitor command

$ deadman-check switch_monitor -h

  NAME:

    switch_monitor

  SYNOPSIS:

    deadman-check switch_monitor [options]

  DESCRIPTION:

    switch_monitor will monitor either a given key which contains a services last epoch checkin and frequency, or a series of services that set keys
under a given key-path in Consul

  EXAMPLES:

    # Target a Consul key deadman/myservice, and this key has an EPOCH value to check looking to alert
    deadman-check switch_monitor --host 127.0.0.1 --port 8500 --key deadman/myservice --alert-to-slack my-slack-monitor-channel

    # Target a Consul key path deadman/, which contains 2 or more service keys to monitor, i.e. deadman/myservice1, deadman/myservice2,
deadmman/myservice3 all fall under the path deadman/
    deadman-check switch_monitor --host 127.0.0.1 --port 8500 --key-path deadman/ --alert-to-slack my-slack-monitor-channel

    # Target a Consul key path deadman/, alert to Amazon SNS, i.e. deadman/myservice1, deadman/myservice2, deadmman/myservice3 all fall under the path
deadman/
    deadman-check switch_monitor --host 127.0.0.1 --port 8500 --key-path deadman/ --alert-to-sns arn:aws:sns:*:123456789012:my_corporate_topic

  OPTIONS:

    --host HOST
        IP address or hostname of Consul system

    --port PORT
        port Consul is listening on

    --key-path KEYPATH
        Consul key path to monitor, performs a recursive key lookup at given path.

    --key KEY
        Consul key to monitor, provide this or --key-path if you have multiple keys in a given path.

    --alert-to-slack SLACKCHANNEL
        Slack channel to send alert, don't include the # tag in name

    --alert-to-sns SNSARN
        Amazon Web Services SNS arn to send alert, example arn arn:aws:sns:*:123456789012:my_corporate_topic

    --alert-to-sns-region AWSREGION
        Amazon Web Services region the SNS topic is in, defaults to us-west-2

    --daemon
        Run as a daemon, otherwise will run check just once

    --daemon-sleep SECONDS
        Set the number of seconds to sleep in between switch checks, default 300

Development

After checking out the repo, run bin/setup to install dependencies. Then, run rake test to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/[USERNAME]/deadman_check. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.

License

The gem is available as open source under the terms of the MIT License.

About

Monitoring companion for Nomad periodic jobs and Cron

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Ruby 99.2%
  • Shell 0.8%