Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have an embedded ubuntu 14.04 image and I'm accessing it over ssh and also have apache serving a site.
Sometimes, apparently randomly I get a strange issues in which any page I try to access on apache I get a Forbidden error.
Simultaneously, if I am connected on ssh, I am unable to execute any command, even if I am in 'sudu su' root mode. All commands return 'No such file or directory' even if I specify the full command path. Interestingly a few commands such as pwd and echo do work, others such as ls, reboot, cat, ping, ...all give that error.
If I am not connected to ssh, trying to connect to ssh doesn't work.
It seems as if all of a sudden there is a general loss of permissions. I'm not sure if it eventually recovers automatically as I always end up doing a power cycle.
I know this is quite generic but that is all I know. syslog doesn;t seem to indicate anything abnormal.
pwd and echo are built into the shell, whereas cat, ls, ping and so on are executables residing on files. That fact could be the start of a clue. Something wrong with the root directory. Or the Loader, typically residing on /lib/ldblabla.so or perhaps /lib64, is inaccessible. Or the loader's config, ld.so.conf and ld.so.cache, corrupted.
It might help if you could provide:
precise commands and error messages
context: when it happens, does it go away by it's own? Do you need to log out? Reboot?
do you log on, work normally, then suddenly commands don't work?
To put you in the picture this is a custom designed i.MX53 board running a 32bit Ubuntu 14.04.1 LTS image and 3.16.2-armv7 kernel. The board has LAN interface and also a pppd connection over cell modem. I am accessing it using an ssh connection over the LAN interface. The board is constantly running in a semi-remote location and I only have network (LAN interface) access to it.
So basically the context in which this happens is that I have this running 24/7 and sometimes (to me it appears randomly) I loose access to it. Seems to be happening every couple of weeks but sometimes more often. I can ping it fine and I can tell some custom processes are still running as I can see it poll an external modbus device but trying to access any page via the webserver I get a Forbidden error:
Quote:
Forbidden
You don't have permission to access / on this server.
Apache/2.4.7 (Ubuntu) Server at 192.168.1.20 Port 443
I sometime leave an ssh connection active for a few days, in that case when the issues happens - as I said previously - all commands return a 'No such file or directory' . that is the only error message I get to any command I try. Good point on pwd and echo are built into the shell. Will check the loader's cache maybe it is being corrupted as you say.
If I happen to not be connected to ssh when this happens, I simply cannot access ssh, failing with this error:
Quote:
$ ssh ubuntu@192.168.1.20
Read from socket failed: Connection reset by peer
To get the system back into normal operation I need to get someone to power cycle it. Until someone can go to power cycle it sometimes it takes a few days and it has never recovered during this time except once it seemed to have recovered by itself after 2 days (and uptime was showing it had not rebooted). I'm not sure if in that particular case there was some other factor, in general it doesn't seem to recover by itself.
Cannot really figure out a sequence of events that leads to the issue. At the moment I have it in this condition but unfortunately I have just now lost ssh connection to it (...was connected via teamviewer to a windows PC on the remote network and ssh to the board from there and stupid windows rebooted to install updates!!!)
I have a suspicion (although not a solid one) that the pppd connection has something to do with this. Meanwhile I have the 'same' image running on an identical local board and identical setup and I never managed to replicate the issue. The images are not exactly identical but both started off from the same image and have the same changes from there onwards. Also I don't have any indication that it is hardware related.
After the power cycle it works perfectly normal, and works for several days without any problem. I can login to ssh and everything is normal then it just happens all of a sudden, normally it happens while I am not using it so I leave it working fine and the next day i find ssh still connected but unusable, or I can no longer connect to ssh.
Thanks!
Last edited by grgsaliba; 07-15-2015 at 04:58 AM.
Reason: added furhter answer
That's an interesting troubleshooting challenge. I wonder what happens to your ssh connection. Does it close the connection because it can't spawn a shell?
A few general ideas, most related to logging.
When you can't connect, use ssh -vvv (or is it -ddd?) - that produces an ocean of debug messages and perhaps a clue what's wrong with the ssh server.
On the server side, you can run a second (or third, ...) sshd daemon in parallel, listening to a different port, with a -vvv option and collecting its messages in a log file. When you don't get connected, will all sshd's refuse to connect?
How about cranking up the Apache log level.
I suppose you have checked if there is a time correlation between all these problems. I see three problem domains - commands not working, unable to connect, web server.
$ ssh -vvv ubuntu@192.168.1.20
OpenSSH_6.7p1, OpenSSL 1.0.1k 8 Jan 2015
debug2: ssh_connect: needpriv 0
debug1: Connecting to 192.168.1.20 [192.168.1.20] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_rsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_rsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_dsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_dsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_ecdsa type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_ecdsa-cert type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_ed25519 type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/George/.ssh/id_ed25519-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.7
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2
debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 pat OpenSSH_6.6.1* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug3: load_hostkeys: loading entries for host "192.168.1.20" from file "/home/George/.ssh/known_hosts"
debug3: load_hostkeys: found key type ECDSA in file /home/George/.ssh/known_hosts:2
debug3: load_hostkeys: loaded 1 keys
debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh...01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
debug1: SSH2_MSG_KEXINIT sent
Read from socket failed: Connection reset by peer
With regards to time correlation, all 3 issues happen at the same time, I am sure of that.
Well at least we can see some negotiation going on between client and server. I am pretty certain the sshd gives up when it can't exec a program like the shell. Perhaps the sshd's error message will provide some info.
yes I think sshd is unable to exec just like I am unable to run commands when I happen to be connected to ssh. Both apahce and sshd are still running. apache even serves the ssl certificate properly and produces the Forbidden error message. the response time is also very fast so there isn't some process hogging up the resources. I don't think the issue is related to sshd or apache, these are just side-effects. It would seem to be related to some form of random corruption, maybe memory corruption, however I have no indication of memory hardware issues and the system was pretty much idle in the past couple of days so there wasn't much going on. The system has 2GB RAM and at the moment it was only using around 128MB with little variation. There are also no other issues indicating memory problems even when the system is loaded.
I'll check apache log when I get access to it and post some snippets. It should log the reason why it forbids access.
managed to have the board power cycled, here are the interesting points:
apache2 error.log just before the issue occurred (I think):
Quote:
ppp0: error fetching interface information: Device not found
ppp0: error fetching interface information: Device not found
ppp0: error fetching interface information: Device not found
ppp0: error fetching interface information: Device not found
[Tue Jul 14 16:00:06.282292 2015] [ssl:warn] [pid 1083] AH01909: RSA certificate configured for 127.0.1.1:443 does NOT include an ID which matches the server name
[Tue Jul 14 16:00:06.853672 2015] [ssl:warn] [pid 1091] AH01909: RSA certificate configured for 127.0.1.1:443 does NOT include an ID which matches the server name
[Tue Jul 14 16:00:06.909609 2015] [mpm_prefork:notice] [pid 1091] AH00163: Apache/2.4.7 (Ubuntu) PHP/5.5.9-1ubuntu4.4 OpenSSL/1.0.1f configured -- resuming normal operations
[Tue Jul 14 16:00:06.909838 2015] [core:notice] [pid 1091] AH00094: Command line: '/usr/sbin/apache2'
not sure why those ppp0 lines are there.
access.log has no entry for the duration of the issue.
The last activity in syslog before the issue occured is also related to pppd but everything seems normal:
Quote:
Jul 14 15:55:21 arm pppd[1278]: pppd 2.4.5 started by root, uid 0
Jul 14 15:55:22 arm chat[1289]: timeout set to 30 seconds
Jul 14 15:55:22 arm chat[1289]: abort on (\nBUSY\r)
Jul 14 15:55:22 arm chat[1289]: abort on (\nERROR\r)
Jul 14 15:55:22 arm chat[1289]: abort on (\nNO ANSWER\r)
Jul 14 15:55:22 arm chat[1289]: abort on (\nNO CARRIER\r)
Jul 14 15:55:22 arm chat[1289]: abort on (\nNO DAILTONE\r)
Jul 14 15:55:22 arm chat[1289]: abort on (\nRING\r\n\r\nRING\r)
Jul 14 15:55:22 arm chat[1289]: send (AT^M)
Jul 14 15:55:22 arm chat[1289]: expect (OK)
Jul 14 15:55:22 arm chat[1289]: ^M
Jul 14 15:55:22 arm chat[1289]: OK
Jul 14 15:55:22 arm chat[1289]: -- got it
Jul 14 15:55:22 arm chat[1289]: send (AT+CGDCONT=1,"IP","sp.telus.com"^M)
Jul 14 15:55:22 arm chat[1289]: timeout set to 30 seconds
Jul 14 15:55:22 arm chat[1289]: expect (OK)
Jul 14 15:55:22 arm chat[1289]: ^M
Jul 14 15:55:22 arm chat[1289]: ^M
Jul 14 15:55:22 arm chat[1289]: OK
Jul 14 15:55:22 arm chat[1289]: -- got it
Jul 14 15:55:22 arm chat[1289]: send (ATD*99***1#^M)
Jul 14 15:55:22 arm chat[1289]: expect (CONNECT)
Jul 14 15:55:22 arm chat[1289]: ^M
Jul 14 15:55:22 arm chat[1289]: ^M
Jul 14 15:55:22 arm chat[1289]: CONNECT
Jul 14 15:55:22 arm chat[1289]: -- got it
Jul 14 15:55:22 arm chat[1289]: send (^M)
Jul 14 15:55:22 arm pppd[1278]: Script /usr/sbin/chat -vV -f /etc/ppp/chat-HSPA910CF-nopin finished (pid 1288), status = 0x0
Jul 14 15:55:22 arm pppd[1278]: Serial connection established.
Jul 14 15:55:22 arm pppd[1278]: using channel 12
Jul 14 15:55:22 arm pppd[1278]: Using interface ppp0
Jul 14 15:55:22 arm pppd[1278]: Connect: ppp0 <--> /dev/mux0
Jul 14 15:55:23 arm pppd[1278]: rcvd [LCP ConfReq id=0x1 <asyncmap 0x0> <auth pap> <magic 0x96a24aa4> <pcomp> <accomp>]
Jul 14 15:55:23 arm pppd[1278]: sent [LCP ConfReq id=0x1 <asyncmap 0x0> <magic 0xf7d60737> <pcomp> <accomp>]
Jul 14 15:55:23 arm pppd[1278]: No auth is possible
Jul 14 15:55:23 arm pppd[1278]: sent [LCP ConfRej id=0x1 <auth pap>]
Jul 14 15:55:23 arm pppd[1278]: rcvd [LCP ConfAck id=0x1 <asyncmap 0x0> <magic 0xf7d60737> <pcomp> <accomp>]
Jul 14 15:55:23 arm pppd[1278]: rcvd [LCP ConfReq id=0x2 <asyncmap 0x0> <magic 0x96a24aa4> <pcomp> <accomp>]
Jul 14 15:55:23 arm pppd[1278]: sent [LCP ConfAck id=0x2 <asyncmap 0x0> <magic 0x96a24aa4> <pcomp> <accomp>]
Jul 14 15:55:23 arm pppd[1278]: sent [LCP EchoReq id=0x0 magic=0xf7d60737]
Jul 14 15:55:23 arm pppd[1278]: sent [CCP ConfReq id=0x1 <deflate 15> <deflate(old#) 15> <bsd v1 15>]
Jul 14 15:55:23 arm pppd[1278]: sent [IPCP ConfReq id=0x1 <addr 0.0.0.0> <ms-dns1 0.0.0.0> <ms-dns2 0.0.0.0>]
Jul 14 15:55:23 arm pppd[1278]: rcvd [LCP EchoRep id=0x0 magic=0x96a24aa4]
Jul 14 15:55:23 arm pppd[1278]: rcvd [LCP ProtRej id=0x3 80 fd 01 01 00 0f 1a 04 78 00 18 04 78 00 15]
Jul 14 15:55:23 arm pppd[1278]: Protocol-Reject for 'Compression Control Protocol' (0x80fd) received
Jul 14 15:55:23 arm pppd[1278]: rcvd [IPCP ConfNak id=0x1 <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:23 arm pppd[1278]: sent [IPCP ConfReq id=0x2 <addr 0.0.0.0> <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:24 arm pppd[1278]: rcvd [IPCP ConfNak id=0x2 <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:24 arm pppd[1278]: sent [IPCP ConfReq id=0x3 <addr 0.0.0.0> <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:25 arm pppd[1278]: rcvd [IPCP ConfNak id=0x3 <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:25 arm pppd[1278]: sent [IPCP ConfReq id=0x4 <addr 0.0.0.0> <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:26 arm vnstatd[1029]: Interface "ppp0" enabled.
Jul 14 15:55:26 arm pppd[1278]: rcvd [IPCP ConfNak id=0x4 <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:26 arm pppd[1278]: sent [IPCP ConfReq id=0x5 <addr 0.0.0.0> <ms-dns1 10.11.12.13> <ms-dns2 10.11.12.14>]
Jul 14 15:55:26 arm pppd[1278]: rcvd [IPCP ConfReq id=0x1]
Jul 14 15:55:26 arm pppd[1278]: sent [IPCP ConfNak id=0x1 <addr 0.0.0.0>]
Jul 14 15:55:26 arm pppd[1278]: rcvd [IPCP ConfNak id=0x5 <addr 10.116.1.173> <ms-dns1 209.91.107.11> <ms-dns2 209.121.225.11>]
Jul 14 15:55:26 arm pppd[1278]: sent [IPCP ConfReq id=0x6 <addr 10.116.1.173> <ms-dns1 209.91.107.11> <ms-dns2 209.121.225.11>]
Jul 14 15:55:26 arm pppd[1278]: rcvd [IPCP ConfReq id=0x2 <addr 10.116.1.173>]
Jul 14 15:55:26 arm pppd[1278]: sent [IPCP ConfAck id=0x2 <addr 10.116.1.173>]
Jul 14 15:55:26 arm pppd[1278]: rcvd [IPCP ConfAck id=0x6 <addr 10.116.1.173> <ms-dns1 209.91.107.11> <ms-dns2 209.121.225.11>]
Jul 14 15:55:26 arm pppd[1278]: replacing old default route to eth0 [192.168.1.254]
Jul 14 15:55:26 arm pppd[1278]: local IP address 10.116.1.173
Jul 14 15:55:26 arm pppd[1278]: remote IP address 10.116.1.173
Jul 14 15:55:26 arm pppd[1278]: primary DNS address 209.91.107.11
Jul 14 15:55:26 arm pppd[1278]: secondary DNS address 209.121.225.11
Jul 14 15:55:26 arm pppd[1278]: Script /etc/ppp/ip-up started (pid 1377)
Jul 14 15:55:28 arm ntpd[4170]: Listen normally on 17 ppp0 10.116.1.173 UDP 123
Jul 14 15:55:28 arm ntpd[4170]: 206.108.0.131 interface 192.168.1.20 -> 10.116.1.173
Jul 14 15:55:28 arm ntpd[4170]: 91.189.89.199 interface 192.168.1.20 -> 10.116.1.173
Jul 14 15:55:28 arm ntpd[4170]: 199.182.221.110 interface 192.168.1.20 -> 10.116.1.173
Jul 14 15:55:28 arm ntpd[4170]: 173.243.192.18 interface 192.168.1.20 -> 10.116.1.173
Jul 14 15:55:28 arm ntpd[4170]: 208.73.56.29 interface 192.168.1.20 -> 10.116.1.173
Jul 14 15:55:28 arm ntpd[4170]: peers refreshed
Jul 14 15:55:28 arm ntpd[4170]: new interface(s) found: waking up resolver
Jul 14 15:55:29 arm postmulti[1447]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postmulti[1451]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix[1462]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix/master[1467]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix/postsuper[1468]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix/master[1004]: reload -- version 2.11.0, configuration /etc/postfix
Jul 14 15:55:29 arm postfix/master[1004]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix/postsuper[1470]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix/qmgr[1473]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix/pickup[1474]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm postfix/sendmail[1485]: warning: /etc/postfix/main.cf, line 111: overriding earlier entry: relayhost=
Jul 14 15:55:29 arm pppd[1278]: Script /etc/ppp/ip-up finished (pid 1377), status = 0x0
There is no other entry in syslog until the system is power cycled.
I am getting more convinced it is some how related to the pppd connection. The device has become more unstable locking up in this state more often. I did a test in which I disabled pppd connection and so far it hasn't produced the issue. Cannot imaging how the pppd connection can create such an issue though...
digging deeper into the issue, it seems that only when i set 'replacedefaultroute' option in pppd I get this strange lock up issue. So it might simply be a related to a packet routing issue. Still even in this mode LAN traffic coming from the subnet should be routed properly and it seems this issue only happens at some point during normal operation.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.