为什么 tomcat 对请求做了排队
问题说明
近期对 tomcat 一个比较老的版本做了一些长请求的测试, 每个请求会触发 tomcat 做一次 sleep 300 秒的操作:
运行环境:
jdk-1.8.0_152
tomcat-6.0.37
tomcat 的 connector 配置:
# 开启 ssl
<Connector port="443" maxHttpHeaderSize="8192" protocol="org.apache.coyote.http11.Http11NioProtocol"
maxThreads="800" acceptCount="1000" minSpareThreads="50"
enableLookups="false" disableUploadTimeout="true"
maxKeepAliveRequests="1000" connectionTimeout="120000"
SSLEnabled="true" scheme="https" secure="true"
clientAuth="true" sslProtocol="TLS" />
配置中, 连接器使用了 Http11NioProtocol
非阻塞协议, Nio 协议默认最大连接数 10000, 同时运行线程数最大 800, 排队最大 1000,从配置来看, tomcat 应该是可以同时运行几百个请求的, 不过从 tcp 抓包来看, 却出现了排队的现象:
// tshark 解析抓包的数据
1 22:47:15.308511 10.11.1.10 → 10.11.1.13 TCP 74 54681 → 443 [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=1035175627 TSecr=0 WS=128
2 22:47:15.336250 10.11.1.13 → 10.11.1.10 TCP 74 443 → 54681 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 TSval=1051388483 TSecr=1035175627 WS=128
3 22:47:15.336261 10.11.1.10 → 10.11.1.13 TCP 66 54681 → 443 [ACK] Seq=1 Ack=1 Win=14720 Len=0 TSval=1035175636 TSecr=1051388483
4 22:47:15.336868 10.11.1.10 → 10.11.1.13 TLSv1.2 286 Client Hello
5 22:47:15.364301 10.11.1.13 → 10.11.1.10 TCP 66 443 → 54681 [ACK] Seq=1 Ack=221 Win=30080 Len=0 TSval=1051388512 TSecr=1035175636
6 22:47:15.398569 10.11.1.13 → 10.11.1.10 TLSv1.2 2304 Server Hello, Certificate, Server Key Exchange, Certificate Request, Server Hello Done
7 22:47:15.398583 10.11.1.10 → 10.11.1.13 TCP 66 54681 → 443 [ACK] Seq=221 Ack=2239 Win=19200 Len=0 TSval=1035175654 TSecr=1051388546
8 22:47:15.704171 10.11.1.10 → 10.11.1.13 TLSv1.2 1514
9 22:47:15.704180 10.11.1.10 → 10.11.1.13 TLSv1.2 179 Ignored Unknown Record
10 22:47:15.731674 10.11.1.13 → 10.11.1.10 TCP 66 443 → 54681 [ACK] Seq=2239 Ack=1782 Win=28544 Len=0 TSval=1051388879 TSecr=1035175746
11 22:47:17.371044 10.11.1.10 → 10.11.1.13 TLSv1.2 591 Certificate Verify
12 22:47:17.371056 10.11.1.10 → 10.11.1.13 TLSv1.2 72 Change Cipher Spec
13 22:47:17.371101 10.11.1.10 → 10.11.1.13 TLSv1.2 167 Encrypted Handshake Message
14 22:47:17.398470 10.11.1.13 → 10.11.1.10 TCP 66 443 → 54681 [ACK] Seq=2239 Ack=2313 Win=28032 Len=0 TSval=1051390546 TSecr=1035176246
15 22:47:17.398809 10.11.1.13 → 10.11.1.10 TLSv1.2 72 Change Cipher Spec
16 22:47:17.398815 10.11.1.10 → 10.11.1.13 TCP 66 54681 → 443 [ACK] Seq=2414 Ack=2245 Win=19200 Len=0 TSval=1035176254 TSecr=1051390546
17 22:47:17.398859 10.11.1.13 → 10.11.1.10 TLSv1.2 167 Encrypted Handshake Message
18 22:47:17.398865 10.11.1.10 → 10.11.1.13 TCP 66 54681 → 443 [ACK] Seq=2414 Ack=2346 Win=19200 Len=0 TSval=1035176254 TSecr=1051390546
19 22:47:17.494386 10.11.1.10 → 10.11.1.13 TLSv1.2 375 Application Data
20 22:47:17.494409 10.11.1.10 → 10.11.1.13 TLSv1.2 311 Application Data
21 22:47:17.561281 10.11.1.13 → 10.11.1.10 TCP 66 443 → 54681 [ACK] Seq=2346 Ack=2968 Win=27520 Len=0 TSval=1051390709 TSecr=1035176283
22 22:55:44.553134 10.11.1.13 → 10.11.1.10 TLSv1.2 327 Application Data
23 22:55:44.553153 10.11.1.10 → 10.11.1.13 TCP 66 54681 → 443 [ACK] Seq=2968 Ack=2607 Win=22016 Len=0 TSval=1035328413 TSecr=1051897694
24 22:55:54.517983 10.11.1.10 → 10.11.1.13 TLSv1.2 151 Encrypted Alert
25 22:55:54.517992 10.11.1.10 → 10.11.1.13 TCP 66 54681 → 443 [FIN, ACK] Seq=3053 Ack=2607 Win=22016 Len=0 TSval=1035331403 TSecr=1051897694
26 22:55:54.545383 10.11.1.13 → 10.11.1.10 TCP 66 443 → 54681 [ACK] Seq=2607 Ack=3053 Win=27520 Len=0 TSval=1051907686 TSecr=1035331403
27 22:55:54.545485 10.11.1.13 → 10.11.1.10 TCP 66 443 → 54681 [FIN, ACK] Seq=2607 Ack=3054 Win=27520 Len=0 TSval=1051907686 TSecr=1035331403
28 22:55:54.545522 10.11.1.10 → 10.11.1.13 TCP 66 54681 → 443 [ACK] Seq=3054 Ack=2608 Win=22016 Len=0 TSval=1035331411 TSecr=1051907686
10.11.1.10 与 10.11.1.13
经过了以下流程:
1. 完成了 tcp 三次握手 --> 对应 1 ~ 3 号报文;
2. 完成了 tls/ssl 握手 --> 对应 3 ~ 18 号报文;
3. 10 主机发送请求数据 --> 对应 19, 20 号报文;
4. 13 主机发送 ack 确认 --> 对应 21 号报文;
5. 3 分钟 29 秒后返回数据 --> 对应 22, 23 号报文;
6. 10 主机断开连接, 完成四次挥手 --> 对应 24 ~ 28 号报文;
对应的 13 主机上的 netstat 输出如下所示, 很多连接的接收队列中(Recv-Q)都有数据:
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
......
tcp6 0 0 :::443 :::* LISTEN 4849/java
tcp6 0 0 127.0.0.1:8005 :::* LISTEN 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44716 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44650 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44744 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44702 ESTABLISHED 4849/java
tcp6 554 0 10.11.1.13:443 10.11.1.10:44712 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44722 ESTABLISHED 4849/java
tcp6 554 0 10.11.1.13:443 10.11.1.10:44752 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44704 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44693 ESTABLISHED 4849/java
tcp6 554 0 10.11.1.13:443 10.11.1.10:44663 ESTABLISHED 4849/java
tcp6 554 0 10.11.1.13:443 10.11.1.10:44655 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44642 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44653 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44644 ESTABLISHED 4849/java
tcp6 0 0 10.11.1.13:443 10.11.1.10:44643 ESTABLISHED 4849/java
tcp6 554 0 10.11.1.13:443 10.11.1.10:44685 ESTABLISHED 4849/java
tcp6 554 0 10.11.1.13:443 10.11.1.10:44711 ESTABLISHED 4849/java
......
从上面的现象来看, tomcat 出现了排队进而形成了整个连接正常完成了 tcp 连接, 并发送数据, 但是一直等待的情况. 相应的, tomcat 正常的请求处理完成后, 开始处理排队中的请求. 所以整体上看, 就出现了下面的问题:
同时发送 120 个请求, 却分了两批顺序执行. 即本该 3 分钟左右执行完的操作, 7 分钟多才执行完.
为什么会产生排队
netstat 的输出可以看到, tomcat 可能产生了排队, 这可能意味着 maxThreads
等参数并没有生效. 从下面的 java 线程堆栈来看:
"http-443-ClientPoller-0" #23 daemon prio=5 os_prio=0 tid=0x00007f3ea8759800 nid=0x10f81 runnable [0x00007f3e6f8fd000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x0000000726239048> (a sun.nio.ch.Util$3)
- locked <0x0000000726239038> (a java.util.Collections$UnmodifiableSet)
- locked <0x0000000726234158> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1591)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
内部线程使用了 nio 相关的连接池, 如下代码:
# tomcat-6.0.37/java/org/apache/tomcat/util/net/NioEndpoint.java
2346 // ---------------------------------------------- TaskQueue Inner Class
2347 public static class TaskQueue extends LinkedBlockingQueue<Runnable> {
2348 ThreadPoolExecutor parent = null;
2349 NioEndpoint endpoint = null;
2350
2351 public TaskQueue() {
2352 super();
2353 }
2354
2355 public TaskQueue(int initialCapacity) {
2356 super(initialCapacity);
2357 }
2358
2359 public TaskQueue(Collection<? extends Runnable> c) {
2360 super(c);
2361 }
2362
2363
2364 public void setParent(ThreadPoolExecutor tp, NioEndpoint ep) {
2365 parent = tp;
2366 this.endpoint = ep;
2367 }
2368
2369 public boolean offer(Runnable o) {
2370 //we can't do any checks
2371 if (parent==null) return super.offer(o); // 1
2372 //we are maxed out on threads, simply queue the object
2373 if (parent.getPoolSize() == parent.getMaximumPoolSize()) return super.offer(o); // 2
2374 //we have idle threads, just add it to the queue
2375 //this is an approximation, so it could use some tuning
2376 if (endpoint.activeSocketProcessors.get()<(parent.getPoolSize())) return super.offer(o); // 3
2377 //if we have less threads than maximum force creation of a new thread
2378 if (parent.getPoolSize()<parent.getMaximumPoolSize()) return false; // 4
2379 //if we reached here, we need to add it to the queue
2380 return super.offer(o);
2381 }
2382 }
public boolean offer(Runnable 0)
函数用来判断是否创建新的线程:
1. return false 表示连接池创建新的线程处理请求;
2. return super.offer(0) 表示将建立的连接(socket 对象)直接放到队列等待处理;
对应的几个变量值如下所示, 由于都是通过方法获取的当前值, 所以在并发情况下, 变量的值都不会很准确.
parent.getPoolSize: 连接池中的线程数;
parent.getMaximumPoolSize 连接池中配置的最大线程数;
endpoint.activeSocketProcessors.get 等同当前正在工作的线程数, 开始处理加 1, 处理完成减 1, 该值变化很快, 可能不准确;
在上述代码的 4 个 if 判断中:
1. 基本不会发生;
2. 实际上在大并发环境下也很难发生, 这种情况不满足我们的条件;
3. 这种概率较大, 当前工作线程数小于连接池已有的线程的时候, tomcat 不会再去创建新的线程; 这种方式猜测应该是想让连接池平滑的增长;
4. 如果不满足 3, 意味着 activeSocketProcessors >= parent.getPoolSize, 这种情况就需要创建新的线程来处理请求;
我们着重查看 3, 4 两种情况, 在该函数中增加一些日志输出后, 如下所示, 输出了上述的三个变量以及当前连接池正在运行的线程数(parent.getActiveCount):
public boolean offer(Runnable o) {
//we can't do any checks
if (parent == null) {
System.out.println("xa1111, parent: null"
+ ", runnable: " + o.getClass().getName() + "@" + Integer.toHexString(System.identityHashCode(o)));
return super.offer(o);
}
//we are maxed out on threads, simply queue the object
if (parent.getPoolSize() == parent.getMaximumPoolSize()) {
System.out.println("xa2222, parent.poolSize: " + parent.getPoolSize()
+ ", activeThreadCount: " + parent.getActiveCount()
+ ", maxPoolSize: " + parent.getMaximumPoolSize()
+ ", activeSocketProcessors: " + endpoint.activeSocketProcessors.get()
+ ", runnable: " + o.getClass().getName() + "@" + Integer.toHexString(System.identityHashCode(o)));
return super.offer(o);
}
//we have idle threads, just add it to the queue
//this is an approximation, so it could use some tuning
if (endpoint.activeSocketProcessors.get() < (parent.getPoolSize())) {
System.out.println("xa3333, parent.poolSize: " + parent.getPoolSize()
+ ", activeThreadCount: " + parent.getActiveCount()
+ ", maxPoolSize: " + parent.getMaximumPoolSize()
+ ", activeSocketProcessors: " + endpoint.activeSocketProcessors.get()
+ ", runnable: " + o.getClass().getName() + "@" + Integer.toHexString(System.identityHashCode(o)));
return super.offer(o);
}
//if we have less threads than maximum force creation of a new thread
if (parent.getPoolSize()<parent.getMaximumPoolSize()) {
System.out.println("xa4444, parent.poolSize: " + parent.getPoolSize()
+ ", activeThreadCount: " + parent.getActiveCount()
+ ", maxPoolSize: " + parent.getMaximumPoolSize()
+ ", activeSocketProcessors: " + endpoint.activeSocketProcessors.get()
+ ", runnable: " + o.getClass().getName() + "@" + Integer.toHexString(System.identityHashCode(o)));
return false;
}
//if we reached here, we need to add it to the queue
System.out.println("xa5555, parent.poolSize: " + parent.getPoolSize()
+ ", activeThreadCount: " + parent.getActiveCount()
+ ", maxPoolSize: " + parent.getMaximumPoolSize()
+ ", activeSocketProcessors: " + endpoint.activeSocketProcessors.get()
+ ", runnable: " + o.getClass().getName() + "@" + Integer.toHexString(System.identityHashCode(o)));
return super.offer(o);
}
重新编译该文件后生成 tomcat-coyote.jar 包, 替换 tomcat lib
目录下原有的 jar 包, 以同样的参数进行测试, 同时发起 120 个请求:
# 批操作平台开始执行
xa3333, parent.poolSize: 5, activeThreadCount: 0, maxPoolSize: 800, activeSocketProcessors: 0, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@73564b1b ==> 1
xa3333, parent.poolSize: 5, activeThreadCount: 1, maxPoolSize: 800, activeSocketProcessors: 1, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@6d42899e
xa3333, parent.poolSize: 5, activeThreadCount: 2, maxPoolSize: 800, activeSocketProcessors: 2, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@7a63a066
xa3333, parent.poolSize: 5, activeThreadCount: 3, maxPoolSize: 800, activeSocketProcessors: 3, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@3f49bd1e
xa3333, parent.poolSize: 5, activeThreadCount: 4, maxPoolSize: 800, activeSocketProcessors: 4, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@3eb01d19
xa4444, parent.poolSize: 5, activeThreadCount: 5, maxPoolSize: 800, activeSocketProcessors: 5, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@68b72979
xa4444, parent.poolSize: 6, activeThreadCount: 6, maxPoolSize: 800, activeSocketProcessors: 6, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@25026a2d
xa4444, parent.poolSize: 7, activeThreadCount: 7, maxPoolSize: 800, activeSocketProcessors: 7, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@487dada9
...
xa3333, parent.poolSize: 90, activeThreadCount: 90, maxPoolSize: 800, activeSocketProcessors: 89, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@15c39f47
xa3333, parent.poolSize: 90, activeThreadCount: 90, maxPoolSize: 800, activeSocketProcessors: 89, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@6d2a0994
xa3333, parent.poolSize: 90, activeThreadCount: 90, maxPoolSize: 800, activeSocketProcessors: 89, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@28db91d9
xa3333, parent.poolSize: 90, activeThreadCount: 90, maxPoolSize: 800, activeSocketProcessors: 89, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@646a3a78
xa3333, parent.poolSize: 90, activeThreadCount: 90, maxPoolSize: 800, activeSocketProcessors: 89, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@10ae62a7
xa4444, parent.poolSize: 90, activeThreadCount: 90, maxPoolSize: 800, activeSocketProcessors: 90, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@414496b0 ==> 2
第一批 90 个执行完, 等待 300s 左右后, 第二批开始执行
xa4444, parent.poolSize: 91, activeThreadCount: 91, maxPoolSize: 800, activeSocketProcessors: 91, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@151978be ==> 3
xa4444, parent.poolSize: 92, activeThreadCount: 92, maxPoolSize: 800, activeSocketProcessors: 92, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@2a95501b
xa4444, parent.poolSize: 92, activeThreadCount: 92, maxPoolSize: 800, activeSocketProcessors: 92, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@1f27612
....
xa3333, parent.poolSize: 63, activeThreadCount: 2, maxPoolSize: 800, activeSocketProcessors: 1, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@1db266b7
xa3333, parent.poolSize: 63, activeThreadCount: 3, maxPoolSize: 800, activeSocketProcessors: 3, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@325177
xa3333, parent.poolSize: 63, activeThreadCount: 2, maxPoolSize: 800, activeSocketProcessors: 2, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@6aef6ea3
xa3333, parent.poolSize: 59, activeThreadCount: 0, maxPoolSize: 800, activeSocketProcessors: 0, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@baa6f57 ==> 4
第二批执行完
这里的 parent.poolSize 初始为 5, 该值对应 tomcate 的 minSpareThreads
参数,
备注: 在 tomcat 6 中, minSpareThreads
和 maxSpareThreads` 参数不允许修改, 默认均为 5, 如下所示:
# tomcat/java/org/apache/tomcat/util/net/NioEndpoint.java
529 /**
530 * Dummy maxSpareThreads property.
531 */
532 public int getMaxSpareThreads() { return Math.min(getMaxThreads(),5); }
533
534
535 /**
536 * Dummy minSpareThreads property.
537 */
538 public int getMinSpareThreads() { return Math.min(getMaxThreads(),5); }
...
可以在配置文件中指定这两个参数, 不过不会生效, 如下启动日志:
May 20, 2021 2:01:08 PM org.apache.catalina.startup.SetAllPropertiesRule begin
WARNING: [SetAllPropertiesRule]{Server/Service/Connector} Setting property 'minSpareThreads' to '50' did not find a matching property.
在上述的日志中, activeSocketProcessors
与 activeThreadCount
应该是等同的关系, 由于突然连接 120 个请求的原因, 在并发环境下两个值会有轻微的差异.
可以看到, 大部分的输出都是在 if 判断的条件 3 和 4, 满足 3 则排队, 满足 4 则新建线程处理请求, 从上面的输出可以看到官方文档对 http Connector 工作模式的描述是有点出入的, 如下所示:
Each incoming request requires a thread for the duration of that request. If more simultaneous requests are received than
can be handled by the currently available request processing threads, additional threads will be created up to the configured
maximum (the value of the maxThreads attribute). If still more simultaneous requests are received, they are stacked up
inside the server socket created by the Connector, up to the configured maximum (the value of the acceptCount attribute).
Any further simultaneous requests will receive "connection refused" errors, until resources are available to process them.
文档中描述的模式, 在大量请求同时来的时候, 我们容易理解成下面的流程:
+------+ +------------------+ +-----------------------+ +------------------+ +----------+
| 请求 | -> | 当前连接池的线程 | -> | 最大线程数 maxThreads | -> | 队列 acceptCount | - > | 拒绝处理 |
+------+ +------------------+ +-----------------------+ +------------------+ +----------+
实际上应该是下面的流程:
+------+ +-------------------+ +------------------+ +-----------------------+ +------------------+ +----------+
| 请求 | -> | 请求队列(backlog) | -> | 当前连接池的线程 | -> | 最大线程数 maxThreads | -> | 队列 acceptCount | - > | 拒绝处理 |
+------+ +-------------------+ +------------------+ +-----------------------+ +------------------+ +----------+
这里的请求队列实际上是 tcp socket 的 backlog 属性, tomcat 中由 acceptCount 参数决定, 如下所示:
# tomcat/java/org/apache/catalina/connector/Connector.java
301 protected static HashMap replacements = new HashMap();
302 static {
303 replacements.put("acceptCount", "backlog"); ==> 等同设置 tcp 连接的 backlog 属性
304 replacements.put("connectionLinger", "soLinger");
305 replacements.put("connectionTimeout", "soTimeout");
306 replacements.put("connectionUploadTimeout", "timeout");
...
所有的 tcp 连接建立后放到 socket 的连接队列中, tomcat 的 worker 循环的检测并接收一个请求后就放到连接池进行处理, 随着请求的增加连接池开始做一系列活动. 对照上面的日志, 可以看到, 请求是逐个处理的, 当处理一个请求走到 if 条件 3 时, 则进行排队:
2374 //we have idle threads, just add it to the queue
2375 //this is an approximation, so it could use some tuning
2376 if (endpoint.activeSocketProcessors.get()<(parent.getPoolSize())) return super.offer(o);
从注释说明来看, 这个判断不是精确的, 对比日志来看, activeSocketProcessors 应该是小于 parent.poolSize 的
, 此时 90 个线程正在工作, 剩余的 30 个请求则在排队:
xa4444, parent.poolSize: 90, activeThreadCount: 90, maxPoolSize: 800, activeSocketProcessors: 90, runnable: org.apache.tomcat.util.net.NioEndpoint$SocketProcessor@414496b0
在 300 秒过后, 有线程执行完, 线程池继续从队列里取对象, 进而跳过条件 3, 满足条件 4, 创建线程继续处理. 参考照官方的几个 bug 来看:
在 BZ 44199
中:
In a properly configured
production system, there should be rare situations where connections wait for a
worker thread to be handled. Our client complained on high latency of web
requests, but the measurement on servlet did not show high latency.
该 bug 在 2008 年提出, 提出者重新构造了一个自定义类就解决了等待的问题. 2017 年后官方合并了此类补丁, 对比 tomcat 7 和 tomcat 8 的情况来看, Tomcat Connector
自带的连接池做了很大的改变, 所以连接池的处理最好还是使用执行器(Executor) 的连接池. 从这方面来看解决上述的问题可以从几方面方面来入手.
如何处理
1. 关闭 NIO 连接池
在 tomcat 6 的连接器 HTTP 文档中, useExecutor
属性决定了是否使用自身的 NIO(对应协议 org.apache.coyote.http11.Http11NioProtocol
) 连接池, 更多见 option-useExecutor, 如下所示:
(bool)Set to true to use the NIO thread pool executor. The default value is true.
If set to false, it uses a thread pool based on a stack for its execution.
Generally, using the executor yields a little bit slower performance, but yields
a better fairness for processing connections in a high load environment as the
traffic gets queued through a FIFO queue. If set to true(default) then the max
pool size is the maxThreads attribute and the core pool size is the minSpareThreads.
This value is ignored if the executor attribute is present and points to a valid
shared thread pool.
useExecutor
选项默认为 true, 使用了 NIO thread pool
, 我们碰到的问题即是在此连接池下的行为. 如果关闭此选项后, 则依据实际的请求数量创建线程, 最大不超过 maxThreads
, 不过性能略不如连接池方式高效. 如下所示
# tomcat-6.0.37/java/org/apache/tomcat/util/net/NioEndpoint.java
# 最大 worker 数为 maxThreads
1169 /**
1170 * Create (or allocate) and return an available processor for use in
1171 * processing a specific HTTP request, if possible. If the maximum
1172 * allowed processors have already been created and are in use, return
1173 * <code>null</code> instead.
1174 */
1175 protected Worker createWorkerThread() {
1176
1177 synchronized (workers) {
1178 if (workers.size() > 0) {
1179 curThreadsBusy++;
1180 return (workers.pop());
1181 }
1182 if ((maxThreads > 0) && (curThreads < maxThreads)) {
1183 curThreadsBusy++;
1184 if (curThreadsBusy == maxThreads) {
1185 log.info(sm.getString("endpoint.info.maxThreads",
1186 Integer.toString(maxThreads), address,
1187 Integer.toString(port)));
1188 }
1189 return (newWorkerThread());
1190 } else {
1191 if (maxThreads < 0) {
1192 curThreadsBusy++;
1193 return (newWorkerThread());
1194 } else {
1195 return (null);
1196 }
1197 }
1198 }
1199 }
在实际的测试中, 我们关闭此选项后, 所有请求(不超过 maxThreads 的数量)可以同时执行, 没有了排队的现象.
2. 使用执行器的连接池
同 useExecutor
选项, 在开启的情况下, 如果指定了 executor
选项, 则会忽略 NIO 的连接池, 使用 Excutor 的连接池, 如下所示, 在 conf/server.xml
配置中增加配置, 实际使用中, 我们都建议使用此配置:
# service 部分增加 Executor
<Executor name="tomcatThreadPool" namePrefix="catalina-exec-"
maxThreads="800" minSpareThreads="50" maxIdleTime="60000"/>
# Connector 中指定 executor
<Connector port="8080" maxHttpHeaderSize="8192" protocol="org.apache.coyote.http11.Http11NioProtocol"
executor="tomcatThreadPool" enableLookups="false" disableUploadTimeout="true"
maxKeepAliveRequests="1000" connectionTimeout="120000" />
3. 升级 tomcat 版本
tomcat-6.0.37
为 2013 年比较老的版本, 对比 tomcat 7 和 8, NIO thread pool
做了很大的改动, 从实际的测试来看, 使用 tomcat 8
版本的 Nio thread pool 也可以正常执行. 同样, 业务在使用 tomcat 8 的时候, 也建议 2 中的设置.