Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[apicast] prometheus metrics policy #5

Closed
wants to merge 16 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion apicast/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ builder: ## Build the builder image
--loglevel=$(LOGLEVEL) --pull-policy=$(PULL_POLICY)

test: ## Run tests (try to start the image)
docker run -it --rm $(LOCAL_IMAGE_NAME) bin/apicast --daemon --dev
docker run -it --rm $(LOCAL_IMAGE_NAME) bin/apicast --test --lazy

start: ## Start APIcast
docker run -it --publish 8080:8080 --env-file=.env --env APICAST_LOG_LEVEL=$(LOG_LEVEL) --rm $(LOCAL_IMAGE_NAME) bin/apicast --lazy --dev
Expand Down
1 change: 1 addition & 0 deletions apicast/Roverfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
luarocks {
group 'production' {
module { 'lua-resty-iputils' },
module { 'nginx-lua-prometheus' },
},

group { 'development', 'test' } {
Expand Down
7 changes: 4 additions & 3 deletions apicast/Roverfile.lock
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apicast scm-1|348144131998f97e2190fa3b3f1c8ba70d2339d3|development,test
apicast scm-1|74a819f94f8781ddf720146aff7fb334e888f903|development,test
argparse 0.5.0-1||development,test
inspect 3.1.1-0||development,test
liquid scm-1|811a73e38fdd9fdea116be4baf310ca326b96c77|development,test
liquid 0.1.0-1||development,test
lua-resty-env 0.4.0-1||development,test
lua-resty-execvp 0.1.0-1||development,test
lua-resty-http 0.12-0||development,test
Expand All @@ -10,5 +10,6 @@ lua-resty-jwt 0.1.11-0||development,test
lua-resty-repl 0.0.6-0|3878f41b7e8f97b1c96919db19dbee9496569dda|development,test
lua-resty-url 0.2.0-1||development,test
luafilesystem 1.7.0-2||development,test
nginx-lua-prometheus 0.20171117-4||production
penlight 1.5.4-1||development,test
router 2.1-0||development,test
router 2.1-0||development,test
10 changes: 7 additions & 3 deletions apicast/config/cloud_hosted.lua
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,16 @@ local PolicyChain = require('apicast.policy_chain')
local policy_chain = context.policy_chain

if not arg then -- {arg} is defined only when executing the CLI
policy_chain:insert(PolicyChain.load_policy('cloud_hosted.metrics', '0.1', { log_level = os.getenv('METRICS_LOG_LEVEL') or 'error' }))
policy_chain:insert(PolicyChain.load_policy('cloud_hosted.rate_limit', '0.1', {
limit = os.getenv('RATE_LIMIT') or 5,
burst = os.getenv('RATE_LIMIT_BURST') or 50 }), 1)
limit = os.getenv('APICAST_RATE_LIMIT') or os.getenv('RATE_LIMIT') or 5,
burst = os.getenv('APICAST_RATE_LIMIT_BURST') or os.getenv('RATE_LIMIT_BURST') or 50,
status = os.getenv('APICAST_RATE_LIMIT_STATUS') or os.getenv('RATE_LIMIT_STATUS'),
}), 1)
policy_chain:insert(PolicyChain.load_policy('cloud_hosted.balancer_blacklist', '0.1'), 1)
end

return {
policy_chain = policy_chain
policy_chain = policy_chain,
port = { metrics = 9421 },
}
2 changes: 1 addition & 1 deletion apicast/cpanfile
Original file line number Diff line number Diff line change
@@ -1 +1 @@
requires 'Test::APIcast', '0.04';
requires 'Test::APIcast', '0.11';
1 change: 1 addition & 0 deletions apicast/http.d/lua_capture_error_log.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
lua_capture_error_log 4k;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is enough 10 log entries.
So it will collect last 10 log entries with desired log level.

For example when you have between prometheus pulls 15 log entries like: 5 error, 10 warning then the error will not appear.
It is always good to set the desired log level to something we actually need.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe this conf could be an env var? Because we certainly will have to play with the Prometheus scrape time and the log capture size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be when it would be part of the liquid template in the main repo.
But here it could not be templated.

And disregard my comment about 10 entries. It fits more.

Well I think 4k is fine when properly configured. We should not capture log levels we don't care about. So lets say there are top ones: emerg, alert, crit, error.
And we configure to capture error and up. We would not really care if there are some missing higher levels, because the error itself is enough to trigger a warning.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is controlled by the "log_map" var? can we expose this one as ENV ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see so set_filter_level will capture everything >=METRICS_LOG_LEVEL right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmprusi done in f45985c as METRICS_LOG_LEVEL. better name suggestions much welcome :)

@maneta yes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! :)

Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
local iputils = require("resty.iputils")
local default_balancer = require('resty.balancer.round_robin').call
local resty_balancer = require('resty.balancer')
local prometheus = require('apicast.prometheus')
local split = require('ngx.re').split
local tonumber = tonumber
local setmetatable = setmetatable

local _M = require('apicast.policy').new('IP Blacklist', '0.1')
local mt = { __index = _M }
Expand All @@ -15,6 +19,7 @@ local ipv4 = {
reserved = { '192.0.0.0/24' }
}

local whitelist = iputils.parse_cidrs(split(os.getenv('APICAST_BALANCER_WHITELIST') or '', ','))
local blacklist = {}

for _,cidrs in pairs(ipv4) do
Expand All @@ -33,15 +38,47 @@ function _M:init()
iputils.enable_lrucache()
end

local function whitelisted(ip)
local whitelisted = iputils.ip_in_cidrs(ip, whitelist)

if whitelisted then
return true
end

local blacklisted, err = iputils.ip_in_cidrs(ip, blacklist)

return not blacklisted, err
end

local balancer_metric = prometheus('counter', 'cloud_hosted_balancer', 'Cloud hosted balancer', {'status'})
local upstream_metric = prometheus('counter', "upstream_status", "HTTP status from upstream servers", {"status"})
local proxy_status_metric = prometheus('counter', "apicast_status", "HTTP status generated by APIcast", {"status"})

local metric_labels = {}

local function increment(metric, label)
if not metric then return end
metric_labels[1] = label -- do not allocate new table for every increment but reuse one

metric:inc(1, metric_labels)
end

local balancer_with_blacklist = resty_balancer.new(function(peers)
local peer, i = default_balancer(peers)

if not peer then
return nil, i
end

local ip = peer[1]
local blacklisted, err = iputils.ip_in_cidrs(ip, blacklist)

if blacklisted then
local allowed, err = whitelisted(ip)

if not allowed then
increment(balancer_metric, 'blacklisted')
return nil, 'blacklisted'
elseif err then
increment(balancer_metric, err)
return nil, err
else
return peer, i
Expand All @@ -59,6 +96,23 @@ function _M:balancer()
ngx.status = ngx.HTTP_SERVICE_UNAVAILABLE
ngx.log(ngx.ERR, "failed to set current backend peer: ", err)
ngx.exit(ngx.status)
else
increment(balancer_metric, 'success')
end
end

local status_map = setmetatable({
-- http://mailman.nginx.org/pipermail/nginx/2013-May/038773.html
[9] = 499,
}, { __index = function(_, k) return tonumber(k) end })

function _M.log()
local upstream_status = tonumber(ngx.var.upstream_status)

if upstream_status then
increment(upstream_metric, upstream_status)
else
increment(proxy_status_metric, status_map[ngx.status])
end
end

Expand Down
1 change: 1 addition & 0 deletions apicast/policies/cloud_hosted.metrics/0.1/init.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
return require('metrics')
121 changes: 121 additions & 0 deletions apicast/policies/cloud_hosted.metrics/0.1/metrics.lua
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
local _M = require('apicast.policy').new('Metrics', '0.1')

local errlog = require('ngx.errlog')
local prometheus = require('apicast.prometheus')
local tonumber = tonumber
local select = select
local find = string.find
local pairs = pairs

local new = _M.new

local log_map = {
'emerg',
'alert',
'crit',
'error',
'warn',
'notice',
'info',
'debug',
}


local function find_i(t, value)
for i=1, #t do
if t[i] == value then return i end
end
end

local empty = {}

local function get_logs(max)
return errlog.get_logs(max) or empty
end

function _M.new(configuration)
local m = new()

local config = configuration or empty
local filter_level = config.log_level or 'error'

local i = find_i(log_map, filter_level)

if not i then
ngx.log(ngx.WARN, _M._NAME, ': invalid level: ', filter_level, ' using error instead')
i = find_i(log_map, 'error')
end

m.filter_level = i
-- how many logs to take in one iteration
m.max_logs = tonumber(config.max_logs) or 100

return m
end

local logs_metric = prometheus('counter', 'nginx_error_log', "Items in nginx error log", {'level'})
local http_connections_metric = prometheus('gauge', 'nginx_http_connections', 'Number of HTTP connections', {'state'})
local shdict_capacity_metric = prometheus('gauge', 'openresty_shdict_capacity', 'OpenResty shared dictionary capacity', {'dict'})
local shdict_free_space_metric = prometheus('gauge', 'openresty_shdict_free_space', 'OpenResty shared dictionary free space', {'dict'})


local metric_labels = {}

local function metric_op(op, metric, value, label)
if not metric then return end
metric_labels[1] = label
metric[op](metric, tonumber(value) or 0, metric_labels)
end

local function metric_set(metric, value, label)
return metric_op('set', metric, value, label)
end

local function metric_inc(metric, label)
return metric_op('inc', metric, 1, label)
end

function _M:init()
local ok, err = errlog.set_filter_level(self.filter_level)

get_logs(100) -- to throw them away after setting the filter level (and get rid of debug ones)

if not ok then
ngx.log(ngx.WARN, self._NAME, ' failed to set errlog filter level: ', err)
end

for name,dict in pairs(ngx.shared) do
metric_set(shdict_capacity_metric, dict:capacity(), name)
end
end

function _M:metrics()
local logs = get_logs(self.max_logs)

for i = 1, #logs, 3 do
metric_inc(logs_metric, log_map[logs[i]] or 'unknown')
end

local response = ngx.location.capture("/nginx_status")

if response.status == 200 then
local accepted, handled, total = select(3, find(response.body, [[accepts handled requests%s+(%d+) (%d+) (%d+)]]))
local var = ngx.var

metric_set(http_connections_metric, var.connections_reading, 'reading')
metric_set(http_connections_metric, var.connections_waiting, 'waiting')
metric_set(http_connections_metric, var.connections_writing, 'writing')
metric_set(http_connections_metric, var.connections_active, 'active')
metric_set(http_connections_metric, accepted, 'accepted')
metric_set(http_connections_metric, handled, 'handled')
metric_set(http_connections_metric, total, 'total')
else
prometheus:log_error('Could not get status from nginx')
end

for name,dict in pairs(ngx.shared) do
metric_set(shdict_free_space_metric, dict:free_space(), name)
end
end

return _M
15 changes: 13 additions & 2 deletions apicast/policies/cloud_hosted.rate_limit/0.1/rate_limit.lua
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
local tonumber = tonumber

local limit_req = require "resty.limit.req"
local prometheus = require('apicast.prometheus')

local _M = require('apicast.policy').new('Rate Limit', '0.1')

Expand Down Expand Up @@ -38,19 +39,28 @@ function _M.new(configuration)
return policy
end

function _M:access(context)
local rate_limits_metric = prometheus('counter', 'cloud_hosted_rate_limit', "Cloud hosted rate limits", {'state'})

local delayed = { 'delayed ' }
local rejected = { 'rejected' }

local proxy_stub = { post_action = function() end }

function _M:rewrite(context)
local limiter = self.limiter

if not limiter then return nil, 'missing limiter' end

local key = context.host or ngx.var.host
local status = self.status or 503
local status = self.status or 429

local delay, err = limiter:incoming(key, true)

if not delay then
ngx.log(ngx.WARN, err, ' request over limit, key: ', key)
if err == "rejected" then
rate_limits_metric:inc(1, rejected)
context.proxy = proxy_stub -- to silence complaining apicast policy
return ngx.exit(status)
end
ngx.log(ngx.ERR, "failed to limit req: ", err)
Expand All @@ -61,6 +71,7 @@ function _M:access(context)
local excess = err

ngx.log(ngx.WARN, 'delaying request: ', key, ' for ', delay, 's, excess: ', excess)
rate_limits_metric:inc(1, delayed)
ngx.sleep(delay)
end
end
Expand Down
74 changes: 74 additions & 0 deletions apicast/t/metrics.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
BEGIN {
$ENV{TEST_NGINX_APICAST_BINARY} ||= 'rover exec apicast';
$ENV{APICAST_POLICY_LOAD_PATH} = './policies';
$ENV{APICAST_BALANCER_WHITELIST} = '127.0.0.1/32';
$ENV{METRICS_LOG_LEVEL} = 'info';
}

use strict;
use warnings FATAL => 'all';
use Test::APIcast::Blackbox 'no_plan';

repeat_each(1);
run_tests();

__DATA__

=== TEST 1: metrics endpoint
--- environment_file: config/cloud_hosted.lua
--- configuration
{
"services": [
{
"proxy": {
"policy_chain": [
{ "name": "cloud_hosted.upstream", "version": "0.1",
"configuration": {
"url": "http://127.0.0.1:$TEST_NGINX_SERVER_PORT", "host": "prometheus"
}
}
]
}
}
]
}
--- request
GET /metrics
--- response_body
# HELP cloud_hosted_balancer Cloud hosted balancer
# TYPE cloud_hosted_balancer counter
cloud_hosted_balancer{status="success"} 1
# HELP nginx_error_log Items in nginx error log
# TYPE nginx_error_log counter
nginx_error_log{level="info"} 1
# HELP nginx_http_connections Number of HTTP connections
# TYPE nginx_http_connections gauge
nginx_http_connections{state="accepted"} 2
nginx_http_connections{state="active"} 2
nginx_http_connections{state="handled"} 2
nginx_http_connections{state="reading"} 0
nginx_http_connections{state="total"} 2
nginx_http_connections{state="waiting"} 0
nginx_http_connections{state="writing"} 2
# HELP nginx_metric_errors_total Number of nginx-lua-prometheus errors
# TYPE nginx_metric_errors_total counter
nginx_metric_errors_total 0
# HELP openresty_shdict_capacity OpenResty shared dictionary capacity
# TYPE openresty_shdict_capacity gauge
openresty_shdict_capacity{dict="api_keys"} 10485760
openresty_shdict_capacity{dict="configuration"} 10485760
openresty_shdict_capacity{dict="init"} 16384
openresty_shdict_capacity{dict="locks"} 1048576
openresty_shdict_capacity{dict="prometheus_metrics"} 16777216
openresty_shdict_capacity{dict="rate_limit_req_store"} 10485760
# HELP openresty_shdict_free_space OpenResty shared dictionary free space
# TYPE openresty_shdict_free_space gauge
openresty_shdict_free_space{dict="api_keys"} 10412032
openresty_shdict_free_space{dict="configuration"} 10412032
openresty_shdict_free_space{dict="init"} 4096
openresty_shdict_free_space{dict="locks"} 1032192
openresty_shdict_free_space{dict="prometheus_metrics"} 16662528
openresty_shdict_free_space{dict="rate_limit_req_store"} 10412032
--- error_code: 200
--- no_error_log
[error]
Loading