You are here
Huge amount of TIME_WAIT connections
In MySQL we have the typical behaviour that we open and close connections very often and rapidly. So we have very short-living connections to the server. This can lead in extreme cases to the situation that the maximum number of TCP ports are exhausted.
The maximum number of TCP ports we can find with:
# cat /proc/sys/net/ipv4/ip_local_port_range 32768 61000
In this example we can have in maximum (61000 - 32768 = 28232) connections concurrently open.
When a TCP connections closes the port cannot be reused immediately afterwards because
the Operating System has to wait for the duration of the TIME_WAIT
interval (maximum segment lifetime, MSL). This we can see with the command:
# netstat -nat Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:10050 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:10051 0.0.0.0:* LISTEN tcp 0 0 127.0.0.1:10051 127.0.0.1:60756 TIME_WAIT tcp 0 0 127.0.0.1:10050 127.0.0.1:50191 TIME_WAIT tcp 0 0 127.0.0.1:10050 127.0.0.1:52186 ESTABLISHED tcp 0 0 127.0.0.1:10051 127.0.0.1:34445 TIME_WAIT
The reason for waiting is that packets may arrive out of order or be retransmitted after the connection has been closed. CLOSE_WAIT
indicates that the other side of the connection has closed the connection. TIME_WAIT
indicates that this side has closed the connection. The connection is being kept around so that any delayed packets can be matched to the connection and handled appropriately.
The Maximum Segment Lifetime can be found as follows:
# cat /proc/sys/net/ipv4/tcp_fin_timeout 60
This basically means your system cannot guarantee more than ((61000 - 32768) / 60 = 470) ports at any given time.
Solutions
There are several strategies out of this problem:
- Open less frequently connections to your MySQL database. Put more payload into one connection. Often Connection Pooling is used to achieve this.
- Increasing the port range. Setting the range to 15000 61000 is pretty common these days (extreme tuning: 1024 - 65535).
- Increase the availability by decreasing the FIN timeout.
Those values can be changed online with:
# echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout # echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
Or permanently by adding it to /etc/sysctl.conf
An other possibility to change this behaviour is to use tcp_tw_recycle
and tcp_tw_reuse
. By default they are disabled:
# cat /proc/sys/net/ipv4/tcp_tw_recycle 0 # cat /proc/sys/net/ipv4/tcp_tw_reuse 0
These parameters allow fast cycling of sockets in TIME_WAIT
state and re-using them. But before you do this change make sure that this does not conflict with the protocols that you would use for the application that needs these ports.
The tcp_tw_recycle
could cause some problems when using load balancers:
tcp_tw_reuse |
Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. Default value is 0.It should not be changed without advice/request of technical experts. |
tcp_tw_recycle |
Enable fast recycling TIME_WAIT sockets. Default value is 0. It should not be changed without advice/request of technical experts. |
Literature
- Shinguz's blog
- Log in or register to post comments
Comments
Thank you for the article.
Thank you for the article. Sorry to nitpick.
My colleague pointed out if the local port range is 32768 – 61000, that there are 28233 available ports, not 28232.
It may also be worth mentioning the difference between FIN TIMEOUT (/proc/sys/net/ipv4/tcp_fin_timeout) and the TIMEWAIT length which is hard-coded to 60s in the linux kernel.
This question contains some good pointers and the answer contains a little program in C that you can compile and use to see how long the timeout is:
http://unix.stackexchange.com/questions/17218/how-long-is-a-tcp-local-so...
A similar article:
http://www.krenel.org/tcp-time_wait-and-ephemeral-ports-bad-friends/
#ports and FIN_TIMEOUT
Hello thatsafunnynamecomment,
Thanks for reading and correcting my findings! You are absolutely right. I did a bit short-cut too much in maths!
For your comment #2 it looks like I did not investigate carefully enough. Thanks for correcting me. I found several sources pointing to
tcp_fin_timeout
and mentionedTIME_WAIT
is affected. So I should just be more careful next time. Especially in a domain I am an absolute noob.Shinguz