LinuxQuestions.org - Root unable to delete file

- Linux - Security (https://www.linuxquestions.org/questions/linux-security-4/)

- - Root unable to delete file (https://www.linuxquestions.org/questions/linux-security-4/root-unable-to-delete-file-880809/)

Root unable to delete file

Hello,

Strangest thing I have ever seen. Any idea why I can't delete a file if I'm root?

Code:

[root@test directory]# rm -f .result.php.swp

rm: cannot remove `.result.php.swp': Read-only file system

[root@test directory]#

[root@test directory]#

[root@test directory]# id

uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel) context=root:system_r:unconfined_t:SystemLow-SystemHigh

[root@test directory]#

[root@test directory]#

[root@test directory]# getfacl .result.php.swp

# file: .result.php.swp

# owner: root

# group: root

user::rw-

group::---

other::---

Hello,

Is this maybe a partition mounted as read only? Or is the directory set to immutable? Check with mount and lsattr.

Kind regards,

Eric

Yes, the answer is right in the middle of all the text.

Quote:

Originally Posted by grob115 (Post 4356793)

rm: cannot remove `.result.php.swp': Read-only file system

Something happened to make this partition switch to read-only mode.
You could try to remount it, but it would be a good idea to run
"fsck -y" on it first. Or just reboot the system.

If this continues to happen, it would be worth finding why it
is switching to read-only.

Thanks. Definitely something gone wrong. Rebooted the system and saw the following during boot up.

Code:

Memory for crash kernel (0x0 to 0x0) notwithin permissible range

PCI: BIOS Bug: MCFG area at e0000000 is not E820-reserved

PCI: Not using MMCONFIG.

Red Hat nash version 5.1.19.6 starting

ata2: softreset failed (device not ready)

  Reading all physical volumes.  This may take a while...

  Found volume group "VolGroup00" using metadata type lvm2

  2 logical volume(s) in volume group "VolGroup00" now active

                Welcome to CentOS release 5.5 (Final)

                Press 'I' to enter interactive startup.

What was that? Then CentOS boot up and displayed the following.

Code:

Setting hostname name.domain.com:                                      [ OK ]

Setting up Logical Volume Management:                                  [ OK ]

/dev/VolGroup00/LogVol00 contains a file system with errors, check forced.

Inode 24218823, i_blocks is 224, should be 216.  FIXED.

/dev/VolGroup00/LogVol00: |================              / 45.8%

After that it appears it went back to the BIOS boot up screen and continued on with the following.

Code:

  2 logical volume(s) in volume group "VolGroup00" now active

                Welcome to CentOS release 5.5 (Final)

                Press 'I' to enter interactive startup.

Setting clock (utc): Sun May 15 06:45:31 PDT 2011              [ OK ]

<I have taken out some other checks>

Setting up Logical Volume Management:  /dev/hdb: open failed:  No medium found

  2 logical volume(s) in volume group "VolGroup00" now active

                              [ OK ]

Checking file systems





/dev/VolGroup00/LogVol00: UNEXPECTED INCONSISTENCY: RUN fsck MANUALLY.

              (i.e., without -a or -p options)

                              [FAILED]

<I have taken out some other messages>

Give root password for maintenance

(or type Control-D to continue):

Quote:

After entering the password, I entered "fsck -f -C" and saw the following.
Pass 1: Checking inodes, blocks, and sizes
Inode 24218823, i_blocks is 216, should be 224. Fix(y)? yes

Inodes that were part of a corrupted orphan linked list found. Fix(y)? yes

Inode 24983876 was part of the orphaned inode list. FIXED.
Inode 24983877 was part of the orphaned inode list. FIXED.
Inode 24983878 was part of the orphaned inode list. FIXED.
Inode 24983879 was part of the orphaned inode list. FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -27984910
Fix(y)? yes

Free blocks count wrong for group #854 (26469, counted=26470).
Fix(y)? yes

Free blocks count wrong (31378175, counted=31378176).
Fix(y)? yes

Inode bitmap differences: -(27983875--27983879)
Fix(y)? yes

Free inodes count wrong for group #854 (32747, counted=32752).
Fix(y)? yes

Free inodes count wrong (37850526, counted=37850531).
Fix(y)? yes

/dev/VolGroup00/LogVol00: ***** FILE SYSTEM WAS MODIFIED *****
/dev/VolGroup00/LogVol00: ***** REBOOT LINUX *****
/dev/VolGroup00/LogVol00: 225885/38076416 files (1.3% non-contiguous), 6698240/38876416 blocks

I did power down the system by pressing the power button because I typed "power -h now" but didn't see the system completely powered down. Would the above have caused by certain information not synced into the hard disk? Or the hard disk itself has a physical issue?

Quote:

If the system hung before it was able to get all of the information written to disk, that could cause issues like this. You probably have some data loss. I am not sure what filesystem you are using, but assuming it is one of the modern journaling types, they were meant to be more resilient to hard drive problems, which do happen, than earlier systems. At the same time, it could be a failure getting ready to happen. Watch the output of dmesg for a while and see if you get errors. When I had a failing drive and, or controller, it showed up with a lot of device not ready much like your "ata2: softreset failed (device not ready)."

Thanks. That message "ata2: softreset failed (device not ready)" had appeared for a while and I have had no clue what it was about. But what exactly does it mean? How can it be not ready and yet useable? And what is a soft reset?

You raise some good questions, that I can't fully answer. The error sounds like the kernel device driver received an unexpected response of some form, from the SATA controller. A soft reset suggests the issuance of a command to "reset" which could be issued in reaction to an error. Unfortunately, your guess is as good as mine.

One interesting thing I noticed is that if you Google the term "ata2: softreset failed" you get a lot of hits regarding known bugs, so apparently there was a kernel change at some point that impacted this. To be safe, you could run some disk check utilities periodically, but you should do them with the drive not mounted, such as from a live CD.

I think this is one to keep your eye on for a while being on the lookout for more errors. You should also make sure you keep backups of important data, though this is always a good precaution.

From past not-fun experiences RAID5 *sometimes* 1 drive will fail, no problem insert the new drive only to find out another drive has an error on the a drive and it cannot rebuild and it sits on a blinking cursor. (The other drive did NOT report any errors) always nice

I am no fan of RAID5, like the above post stated be sure the data is backed up, it sounds like it is getting ready to do something and it does not sound good.

I converted all physical machines to Virtual Machines (VMware not easy) in a SAN HA environment and have another SAN doing snap-shots of the production SAN. In case the prod SAN goes belly up, I can put the other SAN in production.

I got burned on RAID5 too many times, I never take anything for granted it will go belly up and put you in a not-so-nice position 'emergency mode'...

Quote:

Originally Posted by grob115 (Post 4358038)

That message "ata2: softreset failed (device not ready)" had appeared for a while and I have had no clue what it was about. But what exactly does it mean?

libata -- the part of the Linux kernel you use for IDE and SATA support -- uses something called a softreset to force a SATA port to a known good state. Certain chipsets are known to always fail this at boot time (a bug in the chipset, if you will, albeit a completely benign one), for example the AMD SB600/SB700/SB800 chipsets.

However, a softreset failure can be caused by a failure in the hard drive itself. Your symptoms indicate your hard drive is dead or dying. If your hard drive supports S.M.A.R.T -- and if it is of the rotational variety, it most likely does --, you can use smartctl from smartmontools to do an offline check to update the status information, check the attributes, especially the reallocated sector count, and then run a short or long self-test. A fast-increasing or maxed out reallocated sector count is the most reliable indicator of total failure in the near future; a self-test failure means the disk is dead; any further data you put on it is likely to be lost, and you have a very limited opportunity to save its contents.

As to saving the contents of a dying disk, get another disk as large or larger, and create an image of (the partitions on) the old one using sudo dd conv=noerror if=/dev/sdX of=image-file bs=512 (unless it was purely a RAID5 member, in which case don't bother, just get a new disk and rebuild the array). When the dd is running, sudo killall -USR1 dd will make dd (all running dd's) to output progress information.

Quote:

Originally Posted by rhbegin (Post 4358163)

find out another drive has an error on the a drive and it cannot rebuild and it sits on a blinking cursor. (The other drive did NOT report any errors)

I recommend keeping strict tabs on the reallocated sector count of the RAID member drives, if possible, using smartctl from smartmontools package. A normal drive may get a small cluster (a few reallocated sectors) in one go, but there will be a long interval (order of weeks) between these occurrences. When the reallocations start occurring more often, you usually have hours to days before the drive dies. Sometimes the drive stabilizes after a while, and does not die, but I cannot trust such drives. They're like a gas truck thats only a little bit on fire, no biggie.

It's much better to invest in known reliable hard drives, and replace them if they get more than a dozen reallocated sectors. It's a pity Samsung sold their hard drive business to Seagate, as the larger Samsung hard drives were cream of the crop in my experience; they were a lot cheaper than the only other alternative for me, Western Digital. I wouldn't take Seagate disks even if I got them for free, I got so many problems with them. (The funniest one was a Maxtor disk more than a decade ago: it was unbalanced, and would not stay put on a table when turned on. Only vibrator I ever owned.)
Well, I hear Hitachi enterprise grade drives are good, but I've got no experience with those.

Oh, and if you monitor the drives, some drives nowadays also have a temperature sensor you can use to keep tabs on the server status.

To add to this story, am working on a file with a command like "view text.log" and suddenly saw a message appeared from syslogd saying I/O can not commit. So I checked out my /var/log/messages and saw the following keep logging into it.

Most of it is alien language to me but one thing I do notice was back in 15 May, it logged the following line.

Code:

May 15 04:14:14 test kernel: ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

The last bit of this /var/log/messages file.

Code:

May 21 21:31:15 test kernel: ata2.00: status: { DRDY }

May 21 21:31:15 test kernel: ata2: hard resetting link

May 21 21:31:15 test kernel: ata2: softreset failed (device not ready)

May 21 21:31:15 test kernel: ata2: failed due to HW bug, retry pmp=0

May 21 21:31:15 test kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

May 21 21:31:15 test kernel: ata2.00: SB600 AHCI: limiting to 255 sectors per cmd

May 21 21:31:15 test kernel: ata2.00: SB600 AHCI: limiting to 255 sectors per cmd

May 21 21:31:15 test kernel: ata2.00: configured for UDMA/33

May 21 21:31:15 test kernel: ata2: EH complete

May 21 21:31:15 test kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)

May 21 21:31:15 test kernel: sda: Write Protect is off

May 21 21:31:15 test kernel: SCSI device sda: drive cache: write back

May 21 21:31:15 test kernel: ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x400800 action 0x6 frozen

May 21 21:31:15 test kernel: ata2.00: irq_stat 0x08000000, interface fatal error

May 21 21:31:15 test kernel: ata2: SError: { HostInt Handshk }

May 21 21:31:15 test kernel: ata2.00: cmd 61/08:00:d8:61:6b/00:00:00:00:00/40 tag 0 ncq 4096 out

May 21 21:31:15 test kernel:          res 40/00:04:d8:61:6b/00:00:00:00:00/40 Emask 0x50 (ATA bus error)

May 21 21:31:15 test kernel: ata2.00: status: { DRDY }

May 21 21:31:15 test kernel: ata2: hard resetting link

May 21 21:31:15 test kernel: ata2: softreset failed (device not ready)

May 21 21:31:15 test kernel: ata2: failed due to HW bug, retry pmp=0

May 21 21:31:15 test kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

May 21 21:31:15 test kernel: ata2.00: SB600 AHCI: limiting to 255 sectors per cmd

May 21 21:31:15 test kernel: ata2.00: SB600 AHCI: limiting to 255 sectors per cmd

May 21 21:31:15 test kernel: ata2.00: configured for UDMA/33

May 21 21:31:15 test kernel: ata2: EH complete

May 21 21:31:15 test kernel: SCSI device sda: 312581808 512-byte hdwr sectors (160042 MB)

May 21 21:31:15 test kernel: sda: Write Protect is off

May 21 21:31:15 test kernel: SCSI device sda: drive cache: write back

May 21 21:34:49 test kernel: ata2.00: exception Emask 0x50 SAct 0x1 SErr 0x400800 action 0x6 frozen

Executed "smartctl -t long /dev/sda1" and shutdown the computer from which I initiated the Putty session. Upon return, I saw the following when executed "smartctl -a /dev/sda1"

Code:

SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Interrupted (host reset)      60%      707        -

Then it appears the system is busted.

Code:

[root@test log]# smartctl -t long /dev/sda1

smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



Short INQUIRY response, skip product id

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

[root@test log]# man smartctl

[root@test log]# smartctl -T permissive -t long /dev/sda1

smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



Short INQUIRY response, skip product id

Extended Background Self Test has begun

Use smartctl -X to abort test

[root@test log]# smartctl -a /dev/sda1

smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



Short INQUIRY response, skip product id

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.

[root@test log]#

[root@test log]#

[root@test log]# smartctl -T permissive -a /dev/sda1

smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



Short INQUIRY response, skip product id

SMART Health Status: OK

Read defect list: asked for grown list but didn't get it



Error Counter logging not supported

Device does not support Self Test logging

[root@test log]#

[root@test log]#

[root@test log]# smartctl -T permissive -T permissive -a /dev/sda1

smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



Short INQUIRY response, skip product id

SMART Health Status: OK

Read defect list: asked for grown list but didn't get it



Error Counter logging not supported

Device does not support Self Test logging

[root@test log]# shutdown -r now

[root@test log]# man shtudown

Bus error

[root@test log]# man shutdown

Bus error

[root@test log]# df -k

Filesystem          1K-blocks      Used Available Use% Mounted on

/dev/mapper/VolGroup00-LogVol00

                    147536156  23059324 116861552  17% /

/dev/sda1                98580    12681    80809  14% /boot

tmpfs                  964988        0    964988  0% /dev/shm

[root@test log]# man df

Bus error

[root@test log]# reboot -r now

usage: reboot [-n] [-w] [-d] [-f] [-h] [-i]

        -n: don't sync before halting the system

        -w: only write a wtmp reboot record and exit.

        -d: don't write a wtmp record.

        -f: force halt/reboot, don't call shutdown.

        -h: put harddisks in standby mode.

        -i: shut down all network interfaces.

[root@test log]# sync

-bash: /bin/sync: Input/output error

[root@test log]# shutdown

-bash: /sbin/shutdown: Input/output error

[root@test log]# shutdown -r now

-bash: /sbin/shutdown: Input/output error

I had to reboot by pressing the power off button.

Strangely enough the extended test was completed without errors.

Code:

SMART Error Log Version: 1

No Errors Logged



SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error      00%      710        -

# 2  Extended offline    Interrupted (host reset)      60%      707        -

Quote:

Originally Posted by grob115 (Post 4358038)

I can't answer your questions but I get the same message on a Slackware64 13.1 system that is apparently working normally:

Code:

root@CW8:/var/log# grep 'ata2: softreset failed' syslog

May 13 10:16:27 CW8 kernel: ata2: softreset failed (device not ready)

May 13 15:44:39 CW8 kernel: ata2: softreset failed (device not ready)

May 13 22:00:55 CW8 kernel: ata2: softreset failed (device not ready)

May 14 01:01:26 CW8 kernel: ata2: softreset failed (device not ready)

May 14 11:10:24 CW8 kernel: ata2: softreset failed (device not ready)

May 14 15:10:06 CW8 kernel: ata2: softreset failed (device not ready)

May 15 00:17:20 CW8 kernel: ata2: softreset failed (device not ready)

May 15 10:05:55 CW8 kernel: ata2: softreset failed (device not ready)

May 16 09:22:01 CW8 kernel: ata2: softreset failed (device not ready)

May 16 17:46:53 CW8 kernel: ata2: softreset failed (device not ready)

May 17 09:06:17 CW8 kernel: ata2: softreset failed (device not ready)

May 17 16:39:23 CW8 kernel: ata2: softreset failed (device not ready)

May 18 09:17:28 CW8 kernel: ata2: softreset failed (device not ready)

May 19 09:26:13 CW8 kernel: ata2: softreset failed (device not ready)

May 19 13:43:55 CW8 kernel: ata2: softreset failed (device not ready)

May 20 10:23:19 CW8 kernel: ata2: softreset failed (device not ready)

May 20 16:16:57 CW8 kernel: ata2: softreset failed (device not ready)

May 21 10:03:57 CW8 kernel: ata2: softreset failed (device not ready)

May 21 15:26:21 CW8 kernel: ata2: softreset failed (device not ready)

May 21 22:03:22 CW8 kernel: ata2: softreset failed (device not ready)

May 22 09:47:32 CW8 kernel: ata2: softreset failed (device not ready)

May 22 15:09:15 CW8 kernel: ata2: softreset failed (device not ready)

Quote:

Originally Posted by grob115 (Post 4363381)

May 21 21:31:15 test kernel: ata2: softreset failed (device not ready)
May 21 21:31:15 test kernel: ata2.00: SB600 AHCI

First of all, you have an AMD SB600 chipset, which will harmlessly fail the first softreset at boot. The above did not happen at boot, so it's a different issue.

Quote:

Originally Posted by grob115 (Post 4363513)

Strangely enough the extended test was completed without errors.

The SMART long self-test is pretty reliable indicator of the health of the drive. If you do an offline test, then check the attributes, the number of reallocated sectors should be very low, too. All this makes me think your drive is OK.

You know, the log looks suspiciously like a cable problem. I'd reseat (detach and reattach) all SATA cables (including SATA power cables), to see if that helps.

Quote:

Originally Posted by catkin (Post 4363533)

I can't answer your questions but I get the same message on a Slackware64 13.1 system that is apparently working normally:

An AMD SB600/700/750/800 chipset, right? (Use lspci to check.)