poetry

夜饮东坡醒复醉,归来仿佛三更。家童鼻息已雷鸣。敲门都不应,倚杖听江声。
长恨此身非我有,何时忘却营营?夜阑风静縠纹平。小舟从此逝,江海寄馀生。

Keepalived简介

  • 是Linux一个轻量级的高可用解决方案,Keepalive起初是为了LVS设计的,专门用来监控集群系统中各个服务节点状态,如果某个服务器节点出现故障,keepalive将检测到后自动将节点剔除

  • keepalived后来加入VRRP协议,(虚拟路由冗余协议),目的是解决静态路由出现的单点故障问题,通过VRRP协议可以实现网络不间断稳定运行,因此Keepalived也具有Ha功能

  • 两大功能:健康检查和失败切换

    健康检查:采用TCP三次握手,ICMP请求,HTTP请求,UDP echo请求等方式对负载均衡后端真实服务器进行保活。

    失败切换:主要应用于配置主备模式负载均衡器,利用VRRP协议维持主备的心跳,当主负载均衡出现问题的时候,由备负载均衡器承载对应的业务,从而在最大限度上减少流量的损失

  • IPVS疯转,Keepalived里面所有对LVS的相关操作并不直接使用ipvsadm客户端程序,而是直接使用ipvs提供的函数进程操作。

VRRP协议

VRRP协议:

  1. VRRP协议是一种容错的主备协议,保证当主机的下一跳路由出现故障的时候,由另外一台路由来替代出现故障的路由器进行工作,通过VRRP可以在网络发生故障时,透明的进行设备切换而不影响主机之间数据通信
  2. VRRP虚拟路由器:VRRP虚拟路由器是由多台物理路由器组成,对外共享一个或多个IP地址,这个IP地址是虚拟出来的,不属于任何一台路由器,称之为VIP。
  3. VRRP路由器:VRRP路由器就是一台路由器,只是上面运行了VRRP协议
  4. 主路由器:虚拟路由内部有多台物理路由器,但通常只有一台物理路由器对外提供服务,主路由器是选举算法选出对外提供各种网络功能
  5. 备份路由器:VRRP组中除了主路由器之外的所有路由器,不对外提供任何服务,只接受主路由器的通告,当主路由器挂掉时,重新进行选举算法,接替master路由器

三种状态:Initialize、Master、Backup

选举机制:优先级

抢占模式下,一旦有优先级高的路由器加入,即成为Master;

非抢占模式下只要Master不挂,优先级高的路由器只能等待。

只有作为Master的VRRP路由器会一直发送VRRP广播报文。

Keepalived架构

KeepAlived源码模块: check core libipfwc *libipvs** vrrp

core:keepalived核心程序,比如全局配置解析,进程启动等

vrrp:实现vrrp协议的功能

check:keepalived的healthchecker子进程的目录,包括了所有健康检查方式以及对应的配置解析信息,LVS配置解析也在这里

libipfwc:iptables(ipchains库),主要用来配置LVS中的firewall-mark

libipvs*:LVS用到的文件

http://q5q3e2ctx.bkt.clouddn.com/keepalived_架构.webp

  • Schedulerl/OMultiplexer是一个I/O复用分发调度器,它负载安排Keepalived所有内部的任务请求

  • Memory Mngt是一个内存管理机制,这个框架提供了访问内存的一些通用方法

  • Control Plane时keepalived的控制版面,可以实现对配置文件编译和解析

  • Core Componets

    • Watchdog:是计算机可靠领域简单而又非常有效的检测工具,Keepalived正式通过它监控Checkers和VRRP进程的
    • Checkers:这时Keepalived最基础的功能,也是最主要的功能,可以实现对服务器运行状态检测和故障隔离
    • VRRP Stack:这时Keepalived后来引用VRRP功能,可以实现HA集群中失败切换功能,负责负载均衡器之间的实拍切换FailOver
    • IPVS Wrapper:这个是IPVS功能的一个实现,IPVSwrapper模块可以设置好IPVS规则发送至内核空间并且提供给IPVS模块,最终实现IPVS模块的负载功能
    • Netlink Reflector:用来实现高可用集群Failover是虚拟IP(VIP)的设置和切换

keepalived进程:

http://q5q3e2ctx.bkt.clouddn.com/keepalived_process.webp

keepalived配置文件

keepalived配置文件分为三类:

  1. global:全局配置文件
  2. vrrpd配置
  3. lvs配置,如果仅使用keepalived做HA,lvs可以不用配

总体

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
[root@haproxy /]# cat /etc/keepalived/keepalived.conf | grep -Ev '^#|^$'
! Configuration File for keepalived
global_defs { #全局定义
notification_email {
acassen@firewall.loc #通知email
failover@firewall.loc
sysadmin@firewall.loc
}
notification_email_from Alexandre.Cassen@firewall.loc #Email发送者
smtp_server 192.168.200.1 #SMTP服务器
smtp_connect_timeout 30 #连接超时
router_id LVS_DEVEL #Router-id,是路由标识,通常是主机名
vrrp_skip_check_adv_addr #VRRP协议跳过健康检查通告
vrrp_strict #VRRP自定义脚本
vrrp_garp_interval 0
vrrp_gna_interval 0
}
vrrp_instance VI_1 { #VRRP组
state MASTER #master状态
interface eth0 #接口
virtual_router_id 51 #虚拟路由器ID,同一个局域网内
priority 100 #优先级
advert_int 1
authentication { #认证
auth_type PASS
auth_pass 1111
}
virtual_ipaddress { #VIP地址,之后用户访问只需要用vip地址即可
192.168.200.16
192.168.200.17
192.168.200.18
}
}
#后面是LVS相关,这里先不写

全局

Keepalived的详细配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
全局配置
{
#静态IP
static_ipaddress
{
#等价于ip命令:ip addr add 192.168.1.1/24 brd + dev eth0 scope global
192.168.1.1/24 brd + dev eth0 scope global
}

#静态路由
static_routes {
src $SRC_IP to $DST_IP dev $SRC_DEVICE ...
src $SRC_IP to $DST_IP via $GW dev $SRC_DEVICE
}
}

VRRP

VRRPD:vrrp同步组和vrrp实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
VRRP Sync Groups
同步组里的任何一个成员产生问题,则进行master和backup切换。比如内网和外网的实例不在一个同步组,其中一个网段异常,则VIP不会漂移。
vrrp_sync_group VG_1 {
group {
inside_network #这里是实例名,比如VI_1
outside_network
}

notify_master /path/to/to_master.sh #当切换到master时,执行脚本...
notify_backup /path_to/to_backup.sh
notify_fault "/path/fault.sh VG_1"
notify /path/to/notify.sh #切换后发送邮件通知
smtp_alert
}

vrrp实例
主要定义漂移组的ip地址

vrrp_instance inside_network {
state MASTER #定义初始Initial状态,启动后根据优先级马上竞选,state不代表这台一直是master
interface eth0 #实例绑定的网卡
track_interface { eth0 eth1} #设置额外的监控,里面任何一个网卡出现问题,都会进入fault状态
virtual_router_id 51 #VRID 虚拟路由器ID
priority 100
advert_int 1 #检查隔离,默认1s
authentication { auth_type PASS autp_pass 1234 } #认证设置,authe_type可以是pass和ah
virtual_ipaddress {192.168.200.17/24 dev eth1 192.168.200.18/24 dev eth2 label eth2:1}

#发生切换时添加/删除路由
virtual_routes {src 192.168.100.1 to 192.168.109.0/24 via 192.168.200.254 dev eth1}
nopreempt #设置为不抢占,只能设置为state=BACKUP,且优先级最高的机器上
preemtp_delay 300
debug #Debug级别
}

LVS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
virtual_server 192.168.200.100 443 {		#设置一个Virtual Server:VIP+Port
delay_loop 6 #servce polling的delay时间
lb_algo rr #LVS调度算法,有rr wrr lc wlc lblc sh dh
lb_kind NAT #LVS集群模式,有NAT、DR、TUN
persistence_timeout 50 #会话保持时间
protocol TCP #使用TCP协议,有TCP、UDP
sorry_server 192.168.200.200 1358 #备用机,所有的real server失效后再用

real_server 192.168.201.100 443 { #真实物理主机
weight 1 #权重默认为1,0为失效
inhibit_on_failure #在健康检查失败后,将weight设置为0,而不是从ipvs中删除
#配置任意一种健康检查方式:HTTP_GET|SSL_GET|TCP_CHECK|SMTP_CHECK|MISC_CHECK
SSL_GET {
url {
path /
digest ff20ad2481f97b1754ef3e12ecd3a9cc #SSL检查后的摘要信息(genhash工具算出)
}
connect_timeout 3 #连接超时时间
nb_get_retry 3 #重连次数
delay_before_retry 3 #重连间隔时间
}

HTTP_GET {
url {
path /testurl/test.jsp
digest 640205b7b0fc66c1ea91c463fac6334d
}
}

TCP_CHECK { #TCP健康检查
connect_port 80
bindto 192.168.1.1
connect_timeout 4
}

SMTP_CHECK {
host {
connect_ip <IP ADDRESS>
connect_port <PORT>
bindto <IP ADDRESS>
}
}
}
}

lvs实例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
lvs配置实例

virtual_server 192.168.1.1 80 {
delay_loop 3
lb_algo wlc
lb_kind DR
persistence_timeout 1200
protocol TCP ha_suspend
real_server 192.168.1.11 80
{
weight 3
TCP_CHECK {
connect_timeout 3
}
}
real_server 192.168.1.12 80 {
weight 3
TCP_CHECK {
connect_timeout 3
}
}
}

Keepalived + Nginx

http://q5q3e2ctx.bkt.clouddn.com/keepalived_nginx_httpd.webp

环境准备

VIP:192.168.70.199

机器 IP地址 服务
Node1 192.168.70.10 nginx 、 keepalived
Node2 192.168.70.20 nginx 、 keepalived
Web1 192.168.70.30 httpd
Web2 192.168.70.40 httpd

Web/1-2

1
2
3
yum -y install httpd
systemctl restart httpd
echo "THIS IS $HOSTNAME.." > /var/www/html/index.html

Node01

node01,主节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#配置Nginx反向代理
yum -y install epel-release
yum -y install nginx
cat >> /etc/nginx/conf.d/lb.conf << 'EOF'
#负载均衡中的后端服务器池
upstream webserver {
server 192.168.70.30;
server 192.168.70.40;
}

#将本地的8080端口请求转发到代理服务器池中
server {
listen 8080;
location / {
proxy_pass http://webserver;
}
}
EOF
systemctl restart nginx


# 配置Keepalived高可用
yum -y install keepalived
mv /etc/keepalived/keepalived.conf{,.bak}

#注意事项,check_nginx和花括号要有空格,track_script要写在VIP后面
cat >> /etc/keepalived/keepalived.conf << 'EOF'
global_defs {
router_id node01
}
#定义健康检测脚本,执行间隔为1s,如果脚本执行异常,也就是Nginx down了,则优先级-20
vrrp_script check_nginx {
script "/root/check.sh"
interval 1
weight -20
}
vrrp_instance VI_1 {
#node01为Master角色
state MASTER
#使用的接口为ens32,如果不通系统需要改,Keepalived服务起来时,VIP就挂在ens32接口上
interface ens32
#VID,同一个局域网需要相同
virtual_router_id 51
#主节点优先级,定义110,次节点为100
priority 110
advert_int 1
#VIP地址,如果时主主备份可以定义多个
virtual_ipaddress {
192.168.70.199
}
#脚本跟踪,记住空格
track_script {
check_nginx
}
}
EOF

systemctl restart keepalived

#这一段不能用cat的形式录入,不然$会丢失
[root@node01 ~]# cat check.sh
#!/bin/bash
echo "THIS SCRIPT IS RAN!!!"
Num=$(ps -C nginx --no-header | wc -l)
if [ $Num -eq 0 ];then
#Num=0意味着Nginx挂了
exit 1
else
exit 0
fi

chmod +x /root/check.sh

Node02

node02,备份节点

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#配置Nginx反向代理
yum -y install epel-release
yum -y install nginx
cat >> /etc/nginx/conf.d/lb.conf << 'EOF'
upstream webserver {
server 192.168.70.30;
server 192.168.70.40;
}

server {
listen 8080;
server_name 192.168.20.10;
location / {
proxy_pass http://webserver;
}
}
EOF
systemctl restart nginx

#配置Keepalive高可用
yum -y install keepalived
mv /etc/keepalived/keepalived.conf{,.bak}

cat >> /etc/keepalived/keepalived.conf << 'EOF'
global_defs {
router_id node02
}

vrrp_instance VI_1 {
state BACKUP
interface ens32
virtual_router_id 51
priority 100
advert_int 1
virtual_ipaddress {
192.168.70.199
}
}
EOF

systemctl restart keepalived

实验效果

测试节点

http://q5q3e2ctx.bkt.clouddn.com/keepavlied+nginx_1.webp

Down Keepalive

http://q5q3e2ctx.bkt.clouddn.com/keepalived+nginx_donw.webp

Down Nginx

http://q5q3e2ctx.bkt.clouddn.com/keepalived+nginx_down2.webp

Keepalived + haproxy

http://q5q3e2ctx.bkt.clouddn.com/keepalived_haproxy_mysql.webp

实验环境准备

VIP:192.168.70.199

机器 IP地址 服务
Node1 192.168.70.10 haproxy 、keepalived
Node2 192.168.70.20 haproxy 、keepalived
Web1 192.168.70.30 mariadb
Web2 192.168.70.40 mariadb

MySQL双主

node03

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
yum -y install mariadb-server 

sed -i '/^\[mysqld\]$/a\binlog-ignore = information_schema' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\binlog-ignore = mysql' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\skip-name-resolve' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\auto-increment-increment = 1' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\log-bin = mysql-bin' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\auto_increment_offset = 1' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\server-id = 1' /etc/my.cnf.d/server.cnf
cat /etc/my.cnf.d/server.cnf | grep -Ev '#|^$'

mysql -uroot -e "grant replication slave on *.* to 'repuser'@'192.168.70.40' identified by '123456';"
mysql -uroot -e "show master status;"
mysql -uroot -e "change master to master_host='192.168.70.40',master_port=3306,master_user='repuser',master_password='123456',master_log_file='mysql-bin.000003',master_log_pos=407;"
mysql -uroot -e "start slave;"

#授权用户,便于测试
mysql -uroot -e "grant all privileges on *.* to 'zhoujing'@'%' IDENTIFIED BY '000000'; "

http://q5q3e2ctx.bkt.clouddn.com/keepalive_mysql_back1.webp

node04

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
yum -y install mariadb-server 

sed -i '/^\[mysqld\]$/a\binlog-ignore = information_schema' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\binlog-ignore = mysql' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\skip-name-resolve' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\auto-increment-increment =2 ' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\log-bin = mysql-bin' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\auto_increment_offset = 2' /etc/my.cnf.d/server.cnf
sed -i '/^\[mysqld\]$/a\server-id = 2' /etc/my.cnf.d/server.cnf

mysql -uroot -e "grant replication slave on *.* to 'repuser'@'192.168.70.30' identified by '123456';"
mysql -uroot -e "show master status;"
mysql -uroot -e "change master to master_host='192.168.70.30',master_port=3306,master_user='repuser',master_password='123456',master_log_file='mysql-bin.000003',master_log_pos=402;"
mysql -uroot -e "start slave;"

mysql -uroot -e "grant all privileges on *.* to 'zhoujing'@'%' IDENTIFIED BY '000000'; "

http://q5q3e2ctx.bkt.clouddn.com/keepalived+mysql_back2.webp

检查MySQL主主同步是否成功:

http://q5q3e2ctx.bkt.clouddn.com/keepalived+mysql_back3.webp

http://q5q3e2ctx.bkt.clouddn.com/keepalived+mysql_back4.webp

HaProxy代理

Node01 / 02 的Haproxy配置相同

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
yum -y install haproxy
[root@node01 ~]# cat /etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
listen mysql_proxy
bind 0.0.0.0:3306
mode tcp
balance source
server mysqldb1 192.168.70.30:3306 weight 1 check
server mysqldb2 192.168.70.40:3306 weight 2 check
listen stats
mode http
bind 0.0.0.0:8080
stats enable
stats uri /dbs
stats realm haproxy\ statistics
stats auth admin:admin

systemctl restart haproxy

http://q5q3e2ctx.bkt.clouddn.com/keepalived+mysql_back5.webp

http://q5q3e2ctx.bkt.clouddn.com/keepalived_harpxoy_web_page.webp

Keepalived高可用

node01

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#node01上写触发脚本

yum -y install keepalived
[root@node01 ~]# cat /etc/keepalived/keepalived.conf
global_defs {
router_id node01
}

vrrp_script chk_http_port {
#脚本所在位置
script "/root/check_proxy_pid.sh"
interval 1
weight -20
}
vrrp_instance VI_1 {
state MASTER
interface ens32
virtual_router_id 51
priority 110
advert_int 1
virtual_ipaddress {
192.168.70.199
}
track_script {
chk_http_port
}
}

[root@node01 ~]# cat check_proxy_pid.sh
#!/bin/bash
echo "THIS SCRIPT IS RAN!!!"
Num=$(ps -C haproxy --no-header | wc -l)
if [ $Num -eq 0 ];then
exit 1
else
exit 0
fi


systemctl restart keepalived

node02

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
yum -y install keepalived
[root@node01 ~]# cat /etc/keepalived/keepalived.conf
global_defs {
router_id node01
}

vrrp_instance VI_1 {
state MASTER
interface ens32
virtual_router_id 51
priority 110
advert_int 1
virtual_ipaddress {
192.168.70.199
}
track_script {
chk_http_port
}
}

systemctl restart keepalived

http://q5q3e2ctx.bkt.clouddn.com/last_connect.webp


【完】