1.需求:因为历史原因和软件程序原因,有上百台服务和所在服务未运行在容器中,需要在程序奔溃自动拉起(以Java Python C++为主)。
2.目的:能够非人为干预快速自动恢复,要求检测频率在10s一次
3.实现方式
3.1 根据不同语言自己开发脚本实现自动拉起和通知(不够标准化-弃用)
View Code
3.2 使用开源通用软件统一维护自动拉起(开发+运维都可以简单维护和使用)
- 1 #程序安装
- 2 dnf install -y gcc make openssl-devel bison flex zlib-devel
- 3 #apt install -y gcc make libssl-dev bison flex zlib1g-dev
- 4 #yum install -y gcc make openssl-devel bison flex zlib-devel
- 5 wget https://mmonit.com/monit/dist/monit-5.34.0.tar.gz
- 6 tar xf monit-5.34.0.tar.gz && cd monit-5.34.0/
- 7 ./configure --prefix=/usr/local/monit --without-pam && make && make install
- 8 mkdir /usr/local/monit/etc -p && mkdir -p /usr/local/monit/etc/
- 9 cp monitrc /usr/local/monit/etc/
- 10 chmod 600 /usr/local/monit/etc/monitrc #配置文件定义检测时间,检测配置文件
- 11 ln -s /usr/local/monit/bin/monit /usr/sbin/monit
- 12 monit --version
- 13 mkdir /etc/monit/conf.d/ -p #所有进程检测配置文件,如启动脚本变动修改这里即可
- 14 cp /usr/local/monit/bin/monit /usr/bin/
- 15
- 16
- 17 #systemd配置:vi /etc/systemd/system/monit.service
- 18 [Unit]
- 19 Description=Monit process monitor
- 20 Documentation=https://mmonit.com/monit/
- 21 After=network.target
- 22
- 23 [Service]
- 24 Type=forking
- 25 ExecStart=/usr/bin/monit -c /usr/local/monit/etc/monitrc
- 26 ExecReload=/usr/bin/monit -c /usr/local/monit/etc/monitrc reload
- 27 ExecStop=/usr/bin/monit -c /usr/local/monit/etc/monitrc quit
- 28 PIDFile=/var/run/monit.pid
- 29 Restart=on-failure
- 30 User=root
- 31 Group=root
- 32
- 33 [Install]
- 34 WantedBy=multi-user.target # 多用户模式下开机自启
- 35
- 36 systemctl reload monit
- 37 systemctl enable monit
- 38 systemctl start monit
复制代码 3.2.1程序安装.sh
- 1 #进程配置方式vim /usr/local/monit/etc/monitrc:
- 2 set daemon 10
- 3 set logfile /var/log/monit.log
- 4
- 5 # 服务配置 /etc/monit/conf.d/rapidtrade-mock.conf
- 6 check process rapidtrade_mock matching "rapidtrade-mock"
- 7 start program = "/data/scripts/rapidtrade-mock.sh start"
- 8 stop program = "/data/scripts/rapidtrade-mock.sh stop"
- 9 if does not exist then start
- 10
- 11
- 12
- 13 #port配置方式:
- 14 check host rapidtrade_mock with address 127.0.0.1
- 15 if failed
- 16 port 7040
- 17 type tcp
- 18 timeout 5 seconds
- 19 for 2 cycles
- 20 then start
- 21 start program = "/data/scripts/rapidtrade-mock.sh start" as uid root and gid root
- 22 stop program = "/data/scripts/rapidtrade-mock.sh stop" as uid root and gid root
- 23 if 3 restarts within 5 cycles then timeout
- 24
- 25
- 26
- 27
- 28 #健康监测端口+路径:经过测试有问题,无法启动服务并恢复正常。
- 29 check host my_web_service with address 127.0.0.1
- 30 if failed
- 31 port 80
- 32 protocol http
- 33 request "/actuator/prometheus" # 指定要检查的健康检查端点路径
- 34 with timeout 10 seconds
- 35 for 3 cycles
- 36 then restart
- 37 start program = "/usr/bin/systemctl start my-service"
- 38 stop program = "/usr/bin/systemctl stop my-service"
复制代码 3.2.2 配置使用
4.测试和使用
这样基于传统服务,只需要改程序有对应的start.sh stop.sh脚本 就可以简单配置进程健康检测来维护服务自动拉起,简单高效不需要每个团队开发很多自己的脚本去维护。
来源:程序园用户自行投稿发布,如果侵权,请联系站长删除
免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作! |