Strange intermittent permissions problem on Ubuntu 8.04 server edition
I'm having a really strange intermittent permissions problem. I am running Ubuntu 8.04 server edition (uname -a says: Linux server1 2.6.24-21-server #1 SMP Mon Aug 25 18:06:43 UTC 2008 i686 GNU/Linux).
Filesystem used is ext3, disk checking (fsck) is done regularly.
The problem has been present for the past 24 hours. No system updates have occurred in this time frame.
The main symptoms are:
Failed FTP transfers to the box.
Failure of postfix to access maildir to deliver e-mail.
Failure to start SSH session to the box - authentication is OK, but the client crashes straight after (I'm using PuTTY from a Windows XP box).
Failure to access directories and files which the user has appropriate access rights to.
The symptoms are intermittent - therefore I am able to log on to the box at times to inspect logs (as I am doing now).
The box is set up to perform multiple tasks - it is a web server (Apache), e-mail server (Postfix, MailScanner, Dovecot) and a database server (MySQL); it is a client on my MS W2000 AD domain, using Samba; it collects data from webcams; it receives data from a weather station; it is a DHCP server and a DNS server. So, yeah, a real workhorse :-)
There are no load issues - it is well within its capacity, with an average load of less than 0.1 (if you're interested, the hardware is a Dell PowerEdge 2300 with twin Pentium III 550 MHz processors and 1.5 GB RAM), and has provided faultless service for the past three years.
Errors I see from FTP failures look like this:
Thu Oct 30 21:15:30 2008 [pid 31054] CONNECT: Client "192.168.1.6"
Thu Oct 30 21:15:30 2008 [pid 31053] [weather] OK LOGIN: Client "192.168.1.6"
Thu Oct 30 21:15:30 2008 [pid 31055] [weather] FAIL UPLOAD: Client "192.168.1.6", "/var/www/www.hosiene.co.uk/weather/current.html", 0.00Kbyte/sec
Errors I see from postfix look like this (directory paths changed):
Oct 30 19:48:12 server1 postfix/local: warning: maildir access problem for UID/GID=1000/1000: create maildir file /path/to/maildir/tmp/1225396092.P29123.server1: Permission denied
Oct 30 19:48:12 server1 postfix/local: warning: perhaps you need to create the maildirs in advance
Oct 30 19:48:12 server1 postfix/local: 971BD577F1: to=<weather@server1>, relay=local, delay=17, delays=17/0.03/0/0.13, dsn=5.2.0, status=bounced (maildir delivery failed: create maildir file /path/to/maildir/tmp/1225396092.P29123.server1: Permission denied)
I can't find anything logged regarding SSH failures. I have looked in the usual suspects (syslog, messages, daemon.log).
I have experienced in person failures to access files and directories to which users have the appropriate permissions.
I don't think there is any problem with authenication, just access.
It is the intermittent nature of this problem (which appears to affect more than one user, by the way) that is puzzling me. It is actually making me think "intermittent hardware problem" that I guess is going to lead to "hardware failure".
Would any one care to cheer me up by suggesting something I might have missed? I've been puzzling over this one for what feels like a long time, and there is always the risk that the solution is staring me in the face. Any ideas, guys?
Update (01/11/2008): I've noticed that, when I'm having permissions problems, if I start an SSH session to the box as a normal user, 'sudo su' to root and then cd to a directory I am being denied access to as a normal user, then start another SSH session to the box as a normal user, the permissions problem mysteriously goes away. At least for a while, as it recurs after a time. Not sure whether this is significant or not.
Thought I'd better post an update to this thread, seeing as I managed to solve the problem.
Back in the mists of time I created two local users on this server. This was before I had sorted out the authentication by active directory. Guess what? The local users had the same usernames as two users in AD. This wasn't a problem for a long time (probably about 15 months) as it appears that when Postfix delivered mail and when the other user FTP'd to the box then the local account was always used. This behaviour changed recently such that the local account was used only about 30% of the time, the AD account the rest of the time (passwords were identical). When the AD account was used, there were the inevitable permissions problems - as you might expect. Also, due to me not completing the Samba configuration correctly, AD users did not have a shell defined - hence SSH clients managing to authenticate but then crashing.
The update I made at the bottom of the posting further shows the intermittent nature of the user selection at logon.
This is what I did when I realised the problems were down to me:
Mystery solved - all apart from this aspect: how did I get away with it for so long? And why did logon behaviour change when it did?
Hmmm, I'll go away and ponder those questions. Maybe I'll find the solution too......
|All times are GMT -5. The time now is 01:41 AM.|