使用openresty+lua来实现grafana中自动切换实时群集/历史群集对应的vmselect

章娅萝 发表于 2025-9-30 21:39:09

作者:张富春(ahfuzhang)，转载时请注明作者和引用链接，谢谢！

[*]cnblogs博客
[*]zhihu
[*]Github
[*]公众号:一本正经的瞎扯

我曾设计了这样的 VictoriaMetrics 中的实时群集和历史群集：

see: deploy_VictoriaMetrics_cluster
期待的效果是：

[*]实时群集存储最近 7 天的数据，保障足够快足够可靠，提供告警查询和当前的系统监控。

[*]为了保障实时群集的稳定性，通过牺牲存储时间来减少存储的数据

[*]历史群集提供长周期的(例如半年)，且降采样的数据的存储

[*]以低成本的方式提供长周期的数据查询

部署完成后，我部署了一个新的 vmselect 节点，来连接到所有的实时群集和历史群集的vmselect节点上。
可是当我执行如下的查询时，发现数据是正确数据的两倍：
sum by (path) (increase(http_request_total{job="myApp"}))很明显， sum() 时，把实时群集和历史群集中同样的 time series 上的值加了两次。
研究过 dedup 的源码，并未发现明显问题。
无奈，只能通过别的办法绕过去。
于是想到：如果能够自动发现用户查询的时间范围，当用户查询七天以内时转发到实时群集，而查询超过七天就转到历史群集，那么就不用把历史群集和实时群集混合在一起了。
下面是这个思路的详细解决办法：

部署 openresty 的 deployment 的代码如下：
# openresty.yaml

# nginx 的配置文件放在 configMap 中
apiVersion: v1
kind: ConfigMap
metadata:
name: openresty-config
data:
# nginx 的配置文件
nginx.conf: |
worker_processes1;

events {
   worker_connections1024;
}

http {
   access_log /dev/stdout;
   error_log /dev/stderr warn;

   lua_package_path "/usr/local/openresty/nginx/lua/?.lua;;";

   upstream realtime {
         server vmselect-realtime:8481;# 实时群集的 vmselect
   }

   upstream historical {
         server vmselect-historical:8481;# 历史群集的 vmselect
   }

   server {
         listen 8401;

         location /select/0/prometheus/api/v1/query_range {# 核心是修改 query_range 这条 api
            content_by_lua_file /usr/local/openresty/nginx/lua/router.lua;
         }
         # 其它所有路径默认走 realtime
         location / {
            proxy_pass http://realtime;
         }

         location @toRealtime {
            proxy_pass http://realtime;
         }

         location @toHistorical {
            proxy_pass http://historical;
         }
   }
}

# lua 脚本的代码
router.lua: |
-- 时间值要支持三种格式：数值，字符串，grafana中的简写
local function parse_start(val)
   if not val then return nil end
   local num = tonumber(val)
   if num then
         if num > 1e12 then
            return math.floor(num / 1000)
         else
            return num
         end
   end

   local year, mon, day, hour, min, sec =
         val:match("^(%d+)%-(%d+)%-(%d+)T(%d+):(%d+):(%d+)")
   if year then
         return os.time({
            year = tonumber(year),
            month = tonumber(mon),
            day = tonumber(day),
            hour = tonumber(hour),
            min = tonumber(min),
            sec = tonumber(sec)
         })
   end

   local num, unit = val:match("^([%-]?%d+)()$")
   if num and unit then
         num = tonumber(num)
         local seconds = 0
         if unit == "s" then seconds = num
         elseif unit == "m" then seconds = num * 60
         elseif unit == "h" then seconds = num * 3600
         elseif unit == "d" then seconds = num * 86400
         elseif unit == "w" then seconds = num * 7 * 86400
         end
         return ngx.time() + seconds
   end

   return nil
end

local args = ngx.req.get_uri_args()
local is_post = (ngx.req.get_method() == "POST")
local post_args = {}

if is_post then
   ngx.req.read_body()
   post_args = ngx.req.get_post_args()
   for k,v in pairs(post_args) do
         args = v
   end
end

local start = parse_start(args["start"])
local now = ngx.time()
local days = 7-- 这里设定一个七天的范围：七天以内在实时群集查询，超过七天在历史群集查询
local n_days_ago = now - days*24*3600
local step = "300s"-- 当查询历史群集时，使用历史群集的降采样后的间隔，即 5 分钟

if start ~= nil then
   if start > n_days_ago then
         return ngx.exec("@toRealtime")
   else
         if is_post then
            post_args["step"] = step
            local body_tbl = {}
            for k,v in pairs(post_args) do
               table.insert(body_tbl, ngx.escape_uri(k) .. "=" .. ngx.escape_uri(v))
            end
            local new_body = table.concat(body_tbl, "&")
            ngx.req.set_body_data(new_body)
         else
            args["step"] = step
            ngx.req.set_uri_args(args)
         end
         return ngx.exec("@toHistorical")
   end
else
   return ngx.exec("@toRealtime")
end
---
# 这里是部署 openresty 的 deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: openresty
spec:
replicas: 1
selector:
matchLabels:
   app: openresty
template:
metadata:
   labels:
   app: openresty
spec:
   containers:
   - name: openresty
      image: openresty/openresty:1.27.1.2-alpine
      ports:
         - containerPort: 8401
      volumeMounts:
         - name: config
         mountPath: /usr/local/openresty/nginx/conf/nginx.conf
         subPath: nginx.conf
         - name: config
         mountPath: /usr/local/openresty/nginx/lua/router.lua
         subPath: router.lua
      command: ["/usr/local/openresty/bin/openresty"]
      args: ["-g", "daemon off;", "-c", "/usr/local/openresty/nginx/conf/nginx.conf"]
   volumes:
   - name: config
      configMap:
         name: openresty-config
---
apiVersion: v1
kind: Service
metadata:
name: openresty
spec:
selector:
app: openresty
ports:
- protocol: TCP
   port: 8401
   targetPort: 8401
type: ClusterIP通过命令行部署：
KUBECONFIG=~/my-test-k8s.yaml kubectl apply -f ./openresty.yaml -n my-namespace通过 grafana 创建新的数据源，或者可以使用命令查询：
curl -G "http://127.0.0.1:8401/select/0/prometheus/api/v1/query_range?start=-7d" -v可以通过 header X-Server-Hostname 观察数据由哪个服务返回。

Have Fun.
来源：程序园用户自行投稿发布，如果侵权，请联系站长删除
免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！

滕佩杉 发表于 2025-11-7 17:02:00

热心回复！

煞赶峙 发表于 2025-11-29 16:08:40

很好很强大我过来先占个楼待编辑

页: [1]

程序园's Archiver

使用openresty+lua来实现grafana中自动切换实时群集/历史群集对应的vmselect