Linux - Networking This forum is for any issue related to networks or networking.
Routing, network cards, OSI, etc. Anything is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
|
12-14-2007, 06:54 AM
|
#1
|
Member
Registered: Aug 2007
Location: Hungary
Distribution: Fedora, CentOS
Posts: 91
Rep:
|
Too many FIN_WAIT2 in netstat output
Hi,
I use the machine in question to periodically test connection to (mostly dead) machines to see if they're up/workable yet. I use this to maintain a very long list of working servers.
I use wget to try and download a file from them as a test. Wget has been set at a connection timeout of 5, and read timeout of 15.
Now, I have discovered that when typing in the "netstat" command, I have tremendous amount of connections marked "FIN_WAIT2" in the list (I mean, 500+ average).
Can I do anything to make Linux destroy those connections sooner? (preferably without making the machine to reboot)
Levente
Last edited by Sheridan; 12-14-2007 at 06:57 AM.
|
|
|
12-15-2007, 03:00 AM
|
#2
|
LQ Guru
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Rep:
|
Hi -
You can always try reducing the "tcp_fin_timeout" kernel parameter on your server:
http://www.cs.uwaterloo.ca/~brecht/s.../ip-sysctl.txt
Quote:
tcp_fin_timeout - INTEGER
Time to hold socket in state FIN-WAIT-2, if it was closed
by our side. Peer can be broken and never close its side,
or even died unexpectedly. Default value is 60sec.
Usual value used in 2.2 was 180 seconds, you may restore
it, but remember that if your machine is even underloaded WEB server,
you risk to overflow memory with kilotons of dead sockets,
FIN-WAIT-2 sockets are less dangerous than FIN-WAIT-1,
because they eat maximum 1.5K of memory, but they tend
to live longer. Cf. tcp_max_orphans.
|
http://www.cs.uwaterloo.ca/~brecht/servers/tcp.html
Quote:
% cat /proc/sys/net/ipv4/tcp_fin_timeout
60
[To change this to 3 seconds]
# echo "3" > /proc/sys/net/ipv4/tcp_fin_timeout
[To have these new values take effect you may have to do (as root)]
# /etc/rc.d/init.d/network restart
If you want these new values to survive across reboots you can at them to /etc/sysctl.conf.
# Allowed local port range
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 3
|
'Hope that helps .. PSM
Last edited by paulsm4; 12-15-2007 at 03:02 AM.
|
|
|
12-25-2007, 07:24 AM
|
#3
|
Member
Registered: Aug 2007
Location: Hungary
Distribution: Fedora, CentOS
Posts: 91
Original Poster
Rep:
|
Quote:
Originally Posted by paulsm4
|
This was it. Thank you very much!!!
|
|
|
01-23-2008, 05:59 PM
|
#4
|
Member
Registered: Aug 2007
Location: Hungary
Distribution: Fedora, CentOS
Posts: 91
Original Poster
Rep:
|
Quote:
Originally Posted by Sheridan
This was it. Thank you very much!!!
|
Err... Nope. After some time, I checked that machine again, and yo and behold... Let's do first a very barbaristic grep for FIN wait stuff, something like...
lsof | grep 'FIN_WAIT' | wc -l
After some time this concludes to some 7000+ FIN_WAIT2 garbage. Most of which is done by one single process.
I know now that adjusting fin timeout is not the issue - at least it doesn't help with this after all. Even if I set it to like 10, it has no apparent effect on those particular FIN_WAIT2 connections.
And.. like is said I also discovered that all these stale connections are done by the same process...
So... how can I really kill those stale connections (some of them have been up since quite a few weeks or more now)?
One obvious choice would be to just go ahead and kill and restart the process that created them perhaps, yes... Unfortunately, the process which causes this cannot be killed - it would cause major disturbance (and outrage...) from a number of guys since about 10+ people are using it at any given time for simulating/running/calculating/whatev. all kinds of stuff.. And some of those project thingies have been running there for many months, nearing completion with a deadline and all. And, this wouldn't be a long-term solution anyway. You see I cannot go around restarting this thing every now and then and make everyone angry as hell and unable to work seriously on long-term...
So... A solution which doesn't entail shutting down the originator process(es) is needed unfortunately. Any ideas? How to make a little cron script or sg. to "clean-up" long stale FIN_WAIT2 state "connections"?
|
|
|
10-05-2011, 08:58 AM
|
#5
|
LQ Newbie
Registered: Oct 2011
Posts: 1
Rep:
|
Hi Sheridan,
I have the same problem using embedded jetty in my high load website. Did you manage to solve this issue, because fin timeout isn't working for me also.
Best Regards
Pero Gjuzelov
Quote:
Originally Posted by Sheridan
Err... Nope. After some time, I checked that machine again, and yo and behold... Let's do first a very barbaristic grep for FIN wait stuff, something like...
lsof | grep 'FIN_WAIT' | wc -l
After some time this concludes to some 7000+ FIN_WAIT2 garbage. Most of which is done by one single process.
I know now that adjusting fin timeout is not the issue - at least it doesn't help with this after all. Even if I set it to like 10, it has no apparent effect on those particular FIN_WAIT2 connections.
And.. like is said I also discovered that all these stale connections are done by the same process...
So... how can I really kill those stale connections (some of them have been up since quite a few weeks or more now)?
One obvious choice would be to just go ahead and kill and restart the process that created them perhaps, yes... Unfortunately, the process which causes this cannot be killed - it would cause major disturbance (and outrage...) from a number of guys since about 10+ people are using it at any given time for simulating/running/calculating/whatev. all kinds of stuff.. And some of those project thingies have been running there for many months, nearing completion with a deadline and all. And, this wouldn't be a long-term solution anyway. You see I cannot go around restarting this thing every now and then and make everyone angry as hell and unable to work seriously on long-term...
So... A solution which doesn't entail shutting down the originator process(es) is needed unfortunately. Any ideas? How to make a little cron script or sg. to "clean-up" long stale FIN_WAIT2 state "connections"?
|
|
|
|
10-06-2011, 01:27 AM
|
#6
|
Member
Registered: Aug 2007
Location: Hungary
Distribution: Fedora, CentOS
Posts: 91
Original Poster
Rep:
|
Quote:
Originally Posted by perosr
Hi Sheridan,
I have the same problem using embedded jetty in my high load website. Did you manage to solve this issue, because fin timeout isn't working for me also.
Best Regards
Pero Gjuzelov
|
Hi there,
Unfortunately, I got fired over this shortly after I posted this last message. I hear that four other guys (my "succesors", one of them my friend) have also been fired because of the same issue afer I was gone during the course of a single year, so I do not believe a solution has been found. I did not follow it any further. And I certainly wasn't able to think of anything else I could have tried to solve this.
I never had this issue ever since...
I'm sorry but I don't have a clue... I couldn't even help myself.
S.
|
|
|
11-20-2013, 01:22 AM
|
#7
|
LQ Newbie
Registered: Nov 2013
Posts: 1
Rep:
|
Hello, I got stuck with the same problem. My application is using Websocket over TCP and transfers MBs of data in a go. So the connection stays for an average time of 1 minute. After that either the client OR server may initiate a dis-connection as the case may be.
Everything is working fine upto application functionality. Problem is, my server socket stuck in FIN_WAIT2 state for life time until I kill my server process (running on port#8095). So i can see multiple connections in FIN_WAIT2 state. Following is the output of 'netstat -na | grep 8095'
Quote:
[root@localhost ~]# netstat -na | grep 8095
tcp 0 0 ::ffff:172.31.209.18:8095 :::* LISTEN
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:59.182.155.150:20858 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.27:6071 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.27:6570 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.7:32440 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:59.182.155.150:20721 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.119:47107 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.27:15202 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.27:19023 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.119:48941 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.7:13325 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.7:10760 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.7:24342 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:10222 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:1778 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.64:39367 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.24:58245 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:49259 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.80:1083 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:2918 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.124:34061 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:53370 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.88:11815 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.88:47397 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.64:12556 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.64:15456 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:58406 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.10:53305 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:18495 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.80:42849 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.64:27007 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.64:60737 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.10:20746 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.64:30023 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.12:17950 FIN_WAIT2
tcp 0 0 ::ffff:172.31.209.18:8095 ::ffff:49.32.0.40:47676 FIN_WAIT2
|
The client application has been shut down after completing the application flow. Following is the TCP kernel tuning on my Red Hat Enterprise Linux Server release 5.8 (Tikanga) running on HP Proliant G8 server with 192 GB of RAM. No other process is running on this system.
[sysctl.conf]
Quote:
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl( and
# sysctl.conf(5) for more details.
# Controls IP packet forwarding
net.ipv4.ip_forward = 0
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0
# Controls whether core dumps will append the PID to the core filename
# Useful for debugging multi-threaded applications
kernel.core_uses_pid = 1
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Controls the maximum size of a message, in bytes
kernel.msgmnb = 65536
# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 65536
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296
# Default max queues syatem wide = 1024
kernel.msgmni = 2048
#Maximum number of file descriptor allowed
fs.file-max = 1048576
# The Linux autotuning TCP buffer limit
# Defines per-socket memory usage for auto-tuning. The third value is the maximum send buffer space
# (overridden by r/wmem_max).
# The receive buffers for auto-tuning
net.ipv4.tcp_rmem = 4096 87380 16777216
# The write (send) buffers for auto-tuning
net.ipv4.tcp_wmem = 4096 65536 16777216
# Determines how the TCP stack should behave for memory usage; each count is in memory pages (typically 4KB).
# Increase the count for large BDP (but remember, it's memory pages, not bytes).
net.ipv4.tcp_mem = 8388608 8388608 8388608
# For 10G NIC
net.core.netdev_max_backlog = 32768
net.ipv4.tcp_congestion_control = htcp
# Recommended for hosts with jumbo frames enabled
net.ipv4.tcp_mtu_probing = 1
# Enables window scaling as defined by RFC 1323. Must be enabled to support windows larger than 64KB.
net.ipv4.tcp_window_scaling = 1
# Enables selective acknowledgment, which improves performance by selectively acknowledging packets received out of order
# (causing the sender to retransmit only the missing segments), but it can increase CPU utilization.
net.ipv4.tcp_sack = 1
#
net.ipv4.tcp_fack = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_westwood = 1
net.ipv4.tcp_bic = 1
net.ipv4.tcp_max_tw_buckets = 1440000
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 5
# Limit of socket listen() backlog.Defaults to 128. See also tcp_max_syn_backlog for additional tuning
net.ipv4.somaxconn = 4096
net.core.somaxconn = 4096
# Maximal number of remembered connection requests, which are still did not receive an acknowledgment from connecting client
net.ipv4.tcp_max_syn_backlog = 4096
#
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_keepalive_intvl = 30
net.ipv4.tcp_keepalive_time = 400
net.ipv4.tcp_keepalive_probes = 5
vm.swappiness = 20
net.ipv4.tcp_max_orphans = 5
net.ipv4.tcp_orphan_retries = 2
|
Please help and let me know if additional information is required regarding this.
Thanx in advance.
|
|
|
All times are GMT -5. The time now is 12:56 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|