0%

自建DNS间歇性解析异常no more recursive clients (1000/0/1000): quota reached

为了解析方便,某业务平台用Bind自建DNS
近期收到反馈:客户端偶尔的或频繁的解析无响应

网络层面

1
2
dev@ubuntu:~#ping baidu.com
ping: unknown host baidu.com

业务层面

1
IOError: [Errno socket error] [Errno -2] Name or service not known

出现时间及频率随机性,改用公网DNS服务器不会出现问题

排查思路

服务端DNS增加日志记录: warning级别代表"系统有问题,但是(勉强)可用"; err和critical级别代表有致命错误,例如配置文件语法有问题、启动涉及权限等问题

1
2
3
4
5
6
7
8
9
channel "warning_log" {
file "/data/log/named/warning.log" versions 3 size 100M;
severity warning;
print-time yes;
print-severity yes;
print-category yes;
};

category default { "warning_log"; };

测试措施

  • 同负载机器部分切公网DNS服务器
  • 所有机器批量/定时dig测试并记录日志文件

测试现象

  • 确实有解析异常出现,未发现明显规律
  • 切公网机器解析OK

DNS服务端关键日志

1
2
3
4
root@bind@:/data/log/named# tail -f warning.log
02-Apr-2019 11:51:50.000 client: warning: client x.x.x.x#38689: no more recursive clients (1000/0/1000): quota reached
02-Apr-2019 11:51:51.002 client: warning: client x.x.x.x#59881: no more recursive clients (1000/0/1000): quota reached
02-Apr-2019 11:51:52.001 client: warning: client x.x.x.x#40641: no more recursive clients (1000/0/1000): quota reached

解决

Bind手册查找和确认相关参数意义
recursive-clients

1
2
3
recursive-clients    The maximum number of simultaneous recursive lookups the server will perform on behalf of clients.
The default is 1000. Because each recursing client uses a fair bit of memory, on the order of 20 kilobytes,
the value of the recursive-clients option may have to be decreased on hosts with limited memory.

调大允许递归查询的并发数recursive-clients 10000; //default 1000

溯源Bind未开启日志记录原因

添加监控

rndc status 获取当前递归查询信息

1
2
3
4
5
6
7
8
9
10
11
12
13
root@bind:~# rndc status
version: 9.8.1-P1
CPUs found: 4
worker threads: 4
number of zones: 21
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is ON
recursive clients: 2409/9900/10000
tcp clients: 0/100
server is up and running
  • 2409: 当前客户端递归查询请求数
  • 9900: soft limit
  • 10000: hard limit

达到soft limit未达到hard limit 时,会丢弃还未处理完的"旧"的请求