ICE sockets accumulate then cause failure, possible Slackware fix is proposed

selfprogrammed · 11-09-2023, 11:46 PM

*** _peter, Thank You
Please note that freedesktop.org has had a bug report (2009) on this very issue, and decided that it was not important.

*** Reading source code.
I have been reading my way through the ICE and xtrans code.
The functions mentioned in the error messages are entered into a structure for the UNIX_CONN. They are static and not externally visible.
To another program they would be using the Xtransport field names:
->CreateListener
->Close

*** CreateListener does unlink
- I tracked down the Close function => (_IceTrans)SocketUNIXClose
- it will close the socket.
- if the following conditions are met

Code:

  if( XtransConnInfo->flags != 0
      && sockname
      && sun_family == AF_UNIX
      && sun_path
      && (( ! XtransConnInfo->flags & TRANS_NOUNLINK ) || (tnsptr->flags & TRANS_ABSTRACT))
  then
      unlink( sockname->sun_path )

The logic around the flags testing is inconsistent. It only tests those flags for a flag not set.
The unlink would remove the file/socket from the filesystem, if it is not open by any processes.

Interesting is that same unlink function is in "(_IceTrans)SocketUNIXCreateListener".
That unlink is called right before it calls CreateListener to create a new socket.
Why is the unlink not removing any stale socket ???

File: /X11/Xtrans/Xtranssock.c
In (_IceTrans)SocketUNIXCreateListener
effectively

Code:

  UNIX_PATH = "/tmp/.ICE-unix"
  port = "/tmp/.ICE-unix"
  sockname.sun_path = "" + "" + port  
  if( abstract ) then
    ...
  else
    unlink ( sockname.sun_path )   // SHOULD REMOVE STALE SOCKET

  if ( CreateListener( sockname, ... )  ...   // CREATE NEW SOCKET

If that unlink is not going to work in Create, then it is not likely to work in Close either.

pan64 · 11-10-2023, 01:21 AM

just add some error handling to unlink: https://linux.die.net/man/2/unlink, if it is not there.

guanx · 11-10-2023, 01:33 AM

Quote:

Originally Posted by the3dfxdude

Quote:

Originally Posted by pan64

Quote:

Originally Posted by _peter

this is from 2009-07-01 11:03:33 UTC
https://gitlab.freedesktop.org/xorg/...ice/-/issues/1

the interesting question: why is it not yet fixed? I don't really know the answer, there can be a few possibilities, like:
1. it is not that trivial at all
2. it is not that important at all
3. it is a bug, but not in the mentioned lib (cannot be fixed in that lib)
4. it is not a bug at all, but a misuse

Well, 5th possibility, but very similar to number 3, more generally the bug report filed to the wrong people.

6. The discussion is such a classic that nobody wants to break its perfectness with further replies --

Quote:

I realize tthat doesn't really help the situation, but it makes me think this isn't that big of an issue.

The person who posted the reply above works for Apple Inc. (which produces almost exclusively single-user systems), according to Microsoft Bing.

guanx · 11-10-2023, 01:37 AM

Quote:

Originally Posted by pan64

The correct way would be to check if they are really in use.
//snip

Exactly. Probably inotifywait(1) helps. I did not check if it works on special files, though.

selfprogrammed · 11-10-2023, 05:01 PM

I still have the stale links on my Slackware 14.2 system, so I tested the unlink.
The unlink of a stale socket ( run unlink on my new system, on a stale socket on the old system ) did make the stale socket disappear.
A unlink, "removes one link". There does not seem to be any stale link associated with the stale socket that would stop unlink.
Easier to verify the operation than try to explore that.

Leaves the question of why did the unlink not remove the stale socket. It is in that ICE function, right before it creates the new socket.

<* BLINK *>
As I am writing this, it should be obvious. That unlink would remove the stale socket,
unless, it is owned by someone else ! So the unlink existence shows intent to remove stale sockets, but ownership was not considered.
I got to stop working on this at 4 AM.

*** Owner of socket
I had thought that it was Xorg creating the socket. There were many stale sockets on that system, at that time, and I was checking them for process activity.
During my checks, it was the Xorg process that repeatedly came up as matching the sockets. Cannot repeat that now, and don't remember exactly what it was I was using (long session of trying commands and exploring dead-ends).
It was probably during writing the script, exploring what commands could be used to check if a socket was stale or not.
Sorry to mislead.

I verified that after cleaning stale sockets, the lone socket that I have now has the PID of xfce4-session.
Using "lsof", and grep for ICE, shows that only XFCE4 is accessing the ICE socket.
So, it must be xfce4-session that creates the socket.

Many other parts of X are using ICE, ICE-lib, and any fixes invented here probably should be careful not to alter how those behave, or we will be chasing down bugs all over the place.

*** error detection
ICE could detect the failure of the unlink.
1. Requires modifying ICE code, and if we could do that I would rewrite most of it.
2. Requires cooperation from upstream ICE maintainers, and we already can see that it was a known bug, and was deemed not important.
3. Even if ICE realized that the unlink did not happen, there is nothing it could do about it. It could try to use a different number (other than its PID) as the name, but if we were changing it enough to accomplish that, I would have it NOT USE THE PID in the first place. A random number would have less chance of conflict.
4. ICE ought to use a socket name with the UID and the PID in it. That way no stale socket would ever be in conflict with any other user. Must question how are the users of the socket knowing the socket name. Are they getting it as some parameter, or are they inferring it from knowing the xfce4-session PID.

*** XFCE4
There is some evidence that managers, like fvwm, do not suffer from the stale sockets.
ICE is part of X, and is used in about 6 different places in X. It was not invented by XFCE.
So if XFCE is to be fixed, it is likely to be in finding a place where it gets shutdown.
Will need to check that it is actually getting a chance to run that code, and verifying that there is some code there that actually tries to remove its socket (like running the Xtransport->Close function of the ICE instance of Xtrans) (If you think that is complicated to say, try reading the source code).

That script to remove stale sockets, could be called from almost any of the XFCE startup scripts.
It could be stuck into /etc/xdg/xfce4/xinitrc almost anywhere before xfce-session is launched.
This has minimal impact, and seems to have no bad effects.
That may be one of the best solutions, and would fix the problem for most everyone.
That would work even on systems that are only rarely shutdown (as far as I can see, can anyone with such a system comment on that).

Petri Kaukasoina · 11-11-2023, 12:48 AM

Quote:

Originally Posted by selfprogrammed

That script to remove stale sockets, could be called from almost any of the XFCE startup scripts.
It could be stuck into /etc/xdg/xfce4/xinitrc almost anywhere before xfce-session is launched.

If another user starts XFCE, the script in /etc/xdg/xfce4/xinitrc can't remove sockets your xfce-session left. It the other user uses KDE, /etc/xdg/xfce4/xinitrc is not even run.

henca · 11-11-2023, 04:02 AM

Quote:

Originally Posted by selfprogrammed

4. ICE ought to use a socket name with the UID and the PID in it.

Yes, that would be a good solution to the situation that we have now. Even though some unused sockets might be left over they will not deny other users to use ICE.

regards Henrik

pan64 · 11-11-2023, 10:11 AM

Quote:

Originally Posted by henca

Yes, that would be a good solution to the situation that we have now. Even though some unused sockets might be left over they will not deny other users to use ICE.

regards Henrik

but in other cases it's just wrong, there can be more than one X session, but not two (or more) processes with the same pid. Don't forget libICE is used by other desktops too, so it will have impact on all of them.

from my side unlinking something belonging to an unknown is not really reliable (or safe), so it is much better to take the next pid and continue with it (and this can be implemented easily).
The cleanup process itself does not belong to X or XFCE or any display manager, but the OS (reboot?).
If you want to be on the safe side you can randomize that dir or use chroot, but I think both of them require some work.

GazL · 11-12-2023, 05:46 AM

This is what I ended up doing:
In the Xstartup file of XDM, which is run prior to Xsession as root (thus avoiding any permission problems):

Code:

for file in /tmp/.ICE-unix/[1-9]*
do
  [ -e "$file" ] && [ ! -d "/proc/${file##*/}" ] && rm -f -- "$file"
done

If there is no corresponding PID directory in /proc, then the socket should be stale.

This assumes the socket follows the normal convention of being named with the PID of its process. It's not perfect, but it should help prevent collisions following an xdm initiated login. It doesn't help with 'startx' however.

henca · 11-12-2023, 10:25 AM

Quote:

Originally Posted by pan64

but in other cases it's just wrong, there can be more than one X session, but not two (or more) processes with the same pid.

That is why the socket name should contain both the uid and the pid. Such a socket will be usable by the right uid when a pid reused after wrapping around.

regards Henrik

USUARIONUEVO · 11-14-2023, 12:19 PM

Today on a test box with only xfce i see how i cant relog if close the session.

Easy test ..try to relog in the sesion ... xfce not boot.

kde no have problems, i think is a specific xfce bug.

+++

Sorry , cause i remember now i have installed lxdm ... and now im not sure if is xfce or lxdm.

lxdm i build from slackbuilds.

sorry again.

USUARIONUEVO · 11-14-2023, 05:34 PM

If someone are on -current with xfce and lxdm ...the problem is lxdm.

When close session cant log again ..and need to press power button to force a suthdown.

Same enviroment but slackware64-15.0 and lxdm works well.

Sorry for the noise.