0%

tcpdump抓vrrp(keepalived)包遇到的问题

tcpdump抓vrrp包(keepalived多集群环境)

反馈:

  • 有的集群包能抓到,有的不能

  • 在启动/不启动keepalived进程的服务器上抓包现象还不一样

首先确认集群未使用单播方式即未配置unicast_peer

非keepalived节点

一开始基于tcpdump使用习惯抓不到相关包

1
localhost ~ # tcpdump -i any -n -p vrrp -c 1

or

1
localhost ~ # tcpdump -i any -n -p net 224.0.0.0/4 -c 1

or

1
localhost ~ # tcpdump -i eth0 -p vrrp or net 224.0.0.0/4 -c 1

or

1
localhost ~ # tcpdump -n  vrrp -c 1

等等,都不行

无意中发现下面的命令可以

1
2
3
4
5
6
localhost ~ # tcpdump -n net 224.0.0.0/4 -c 1
dropped privs to pcap
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:47:46.775771 IP 192.168.3.3 > 224.0.0.18: AH(spi=0x0a0a0303,seq=0xa06eef,icv=0x3468e16acdfda3b381f98472): VRRPv2, Advertisement, vrid 52, prio 100, authtype ah, intvl 1s, length 20
1 packet captured

然后翻man tcpdump,发现个-i any一直未关注到过的细节

1
2
3
On Linux systems with 2.2 or later kernels, an interface argument of ``any'' can be
used to capture packets from all interfaces. Note that captures on the ``any''
device will not be done in promiscuous mode.

也就是说只有混杂模式下才能抓到的话会不适用,参数-p也同样不行

-p Don't put the interface into promiscuous mode.  Note that the interface might be in promiscuous mode for some other reason; hence,
`-p' cannot be used as an abbreviation for `ether host {local-hw-addr} or ether broadcast'.

但是回溯上面的localhost ~ # tcpdump -n vrrp -c 1命令为什么也不行?

增加-v打印更具体详情,发现其proto AH (51),而非认知中的proto VRRP (112),如下

1
2
3
4
5
localhost ~ # tcpdump -n  net 224.0.0.0/4 -c 1 -v
dropped privs to pcap
tcpdump: listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
15:56:58.812336 IP (tos 0xc0, ttl 255, id 9485, offset 0, flags [none], proto AH (51), length 64)
192.168.3.3 > 224.0.0.18: AH(length=4(24-bytes),spi=0x0a0a0303,seq=0xa07117,icv=0xb63fae82ced8b9c742459147): VRRPv2, Advertisement, vrid 52, prio 100, authtype ah, intvl 1s, length 20, addrs: 111.202.40.180

再次测试发现tcpdump -n ah -c 1同样可以了。

查看keepalived.conf配置,是auth_type AH配置项导致(基于IPSec认证,官方不建议);其实认证这块rfc3768已抛弃,保留是为了兼容。

类似情况,更好的抓包参数是指定具体网卡+网段模式

1
localhost ~ # tcpdump -i eth0 -n net 224.0.0.0/4

or

1
localhost ~ # tcpdump -i eth0 -n vrrp or ah

keepalived节点(启动keepalived进程的机器)

发现keepalived节点tcpdump -i anytcpdump -p都可以抓到相关数据,启动keepalived的节点有什么不同?

排查思路(走过的弯路):

  • keepalived进程启动后执行ip -d link | grep promiscuity,未有网卡处于混杂模式
  • man keepalivedman keepalived.conf,未发现关键点
  • 将keepalived进程启动前后的sysctl -a分别保存为tmp1、tmp2两个文件,vimdiff tmp1 tmp2,未发现关键点

man ip,最终看到ip-maddress关键字

ip maddress show dev eth0 对比keepalived进程启动前后diff /tmp/eth0.ori /tmp/eth0.keep

1
2
3
4a5,6
> link 01:00:5e:00:00:12
> inet 224.0.0.18

具体多播地址(分配)参考:https://en.wikipedia.org/wiki/Multicast_address

查看网卡是否处于混杂模式(promiscuous mode)

promiscuity 0 非混杂模式
promiscuity 1 混杂模式

1
2
3
test ~ # ip -d link | grep promiscuity
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 addrgenmode eui64
link/ether 00:50:56:b9:b0:f5 brd ff:ff:ff:ff:ff:ff promiscuity 1 addrgenmode eui64