LinuxQuestions.org - Getting thrown into read-only mode

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - Getting thrown into read-only mode - not able to remount (https://www.linuxquestions.org/questions/linux-general-1/getting-thrown-into-read-only-mode-not-able-to-remount-550832/)

Getting thrown into read-only mode - not able to remount

Hi!

Can anybody suggest any reasons why my FC6 system would get thrown into read-only mode? I boot happily in read-write, everything gets mounted and life is great.

Then, some time later (usually the order of several days to one month) for some reason that eludes me, my system suddenly gets switched to read-only mode.

I try remounting with:

Code:

mount -a -o remount

There is no error, but the system does not get remounted, either.

The only way so far I have found to get out of this situation is to reboot the system, which I'm not fond of doing.

Any ideas?

Check /var/log/messages. When a filesystem gets made RO, it's usually because a filesystem corruption was detected. The switch to RO is to protect the filesystem from further corruption, which could cause loss of data.

Boot off a rescue disk (the Fedora installation media can be used by entering "linux rescue" at the boot prompt), do not mount the disk filesystem, and run:

e2fsck -C -p -f -y /dev/ABCN

Where '/dev/ABCN' is the partition containing the filesystem (e.g., /dev/hda3, /dev/sdb1, etc.).

Ok, thank you for this.

e2fsck did seem to notice a few problems.

So... I guess I'll know for sure if my machine keeps running properly for more than a few days.

Thanks!

Not that it will make much difference, but
man:/e2fsck (ver. 1.40-WIP (02-Oct-2006)) says:

Quote:

-y
Assume an answer of `yes' to all questions; allows e2fsck to be used non-interactively. This option may not be specified at the same time as the -n or -p options.

The man page is incorrect in this case.

Update: I've opened a bug against the e2fsck package to correct the documentation.

Thanks for the info. That explains why

Code:

e2fsck -C -p -f /dev/md0

bugs me w/ Q's I did not expect &

Code:

e2fsck -C -n -p -f /dev/md0

works the way I thought it would w/o the "-n" option.

BTW, the author of the code recently joined LQ as "tytso". I don't know if he's also the author of the man page. Perhaps he might comment here.

Here it goes again!

Damn! After running e2fsck a few weeks ago as written in the previous posts, the same problem occurred again today.

Rather than throwing my box off a 20-story building, any other ideas about how I can track down the source of this problem?

I didn't notice anything in the logs that would seem relevant.

Did you run e2fsck again?

Did you reboot & did e2fsck run during reboot?

If so, & that fixed the problem, then I begin to suspect a dying HD.

Is it backed up? :)

Caution: If you did a default FC6 installation, your root file system is in a logical volume group, and running fsck on the device/partition containing a volume group can destroy the whole volume.

The only way to run fsck on a volume in a volume group is to run it on the /dev/<VG_name> created by the device mapper.

Do not run fsck /dev/hda2 if /dev/hda2 contains a logical volume group.

Thanks for the info./advice/warning, PTrenholme.

BTW, dleangen, did the orig. answers here fix the problem?

Make sure you are running (at least) an updated kernel. There was a low probability bug in the older vanilla kernels on which Fedora is based that could cause disk corruption.

yum update kernel

or, even better:

yum update

Quote:

Did you run e2fsck again?

Did you reboot & did e2fsck run during reboot?

If so, & that fixed the problem, then I begin to suspect a dying HD.

Is it backed up?

Yes to all of the above.

Is there any way to run thorough tests on the HD so I _know_ what's going on?

(Unfortunately, I don't have access to the machine again for a few weeks, so I can't try updating the kernel just yet.)

Thanks to all!!

Read the smartctl man page & run some tests & ask some more Q's.

smartctl shows no errors

Well, it seems to me that smartctl is not showing any errors.

This is the output:

Code:

#smartctl -a -t long -d ata /dev/sda

smartctl version 5.36 [i686-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen

Home page is http://smartmontools.sourceforge.net/



=== START OF INFORMATION SECTION ===

Device Model:    Maxtor 6V200E0

Serial Number:    V40D0KYG

Firmware Version: VA111630

User Capacity:    203,928,109,056 bytes

Device is:        Not in smartctl database [for details use: -P showall]

ATA Version is:  7

ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0

Local Time is:    Wed Jun  6 14:07:16 2007 JST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled



=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED



General SMART Values:

Offline data collection status:  (0x80) Offline data collection activity

                                        was never started.

                                        Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 249) Self-test routine in progress...

                                        90% of test remaining.

Total time to complete Offline 

data collection:                (1742) seconds.

Offline data collection

capabilities:                    (0x5b) SMART execute Offline immediate.

                                        Auto Offline data collection on/off support.

                                        Suspend Offline collection upon new

                                        command.

                                        Offline surface scan supported.

                                        Self-test supported.

                                        No Conveyance Self-test supported.

                                        Selective Self-test supported.

SMART capabilities:            (0x0003) Saves SMART data before entering

                                        power-saving mode.

                                        Supports SMART auto save timer.

Error logging capability:        (0x01) Error logging supported.

                                        General Purpose Logging supported.

Short self-test routine 

recommended polling time:        (  2) minutes.

Extended self-test routine

recommended polling time:        (  93) minutes.



SMART Attributes Data Structure revision number: 32

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  3 Spin_Up_Time            0x0027  207  207  063    Pre-fail  Always      -      4957

  4 Start_Stop_Count        0x0032  253  253  000    Old_age  Always      -      27

  5 Reallocated_Sector_Ct  0x0033  253  253  063    Pre-fail  Always      -      0

  7 Seek_Error_Rate        0x000a  253  247  000    Old_age  Always      -      0

  8 Seek_Time_Performance  0x0027  247  224  187    Pre-fail  Always      -      65277

  9 Power_On_Hours          0x0032  248  248  000    Old_age  Always      -      1957

 10 Spin_Retry_Count        0x002b  253  252  157    Pre-fail  Always      -      0

 11 Calibration_Retry_Count 0x002b  253  252  223    Pre-fail  Always      -      0

 12 Power_Cycle_Count      0x0032  253  253  000    Old_age  Always      -      47

189 Unknown_Attribute      0x003a  100  100  000    Old_age  Always      -      0

190 Unknown_Attribute      0x0022  057  056  000    Old_age  Always      -      740753451

192 Power-Off_Retract_Count 0x0032  253  253  000    Old_age  Always      -      0

193 Load_Cycle_Count        0x0032  253  253  000    Old_age  Always      -      0

194 Temperature_Celsius    0x0032  039  253  000    Old_age  Always      -      43

195 Hardware_ECC_Recovered  0x000a  253  252  000    Old_age  Always      -      687

196 Reallocated_Event_Count 0x0008  253  253  000    Old_age  Offline      -      0

197 Current_Pending_Sector  0x0008  253  253  000    Old_age  Offline      -      0

198 Offline_Uncorrectable  0x0008  253  253  000    Old_age  Offline      -      0

199 UDMA_CRC_Error_Count    0x0008  199  199  000    Old_age  Offline      -      0

200 Multi_Zone_Error_Rate  0x000a  253  252  000    Old_age  Always      -      0

201 Soft_Read_Error_Rate    0x000a  253  252  000    Old_age  Always      -      0

202 TA_Increase_Count      0x000a  253  252  000    Old_age  Always      -      0

203 Run_Out_Cancel          0x000b  253  252  180    Pre-fail  Always      -      0

204 Shock_Count_Write_Opern 0x000a  253  252  000    Old_age  Always      -      0

205 Shock_Rate_Write_Opern  0x000a  253  252  000    Old_age  Always      -      0

207 Spin_High_Current      0x002a  253  252  000    Old_age  Always      -      0

208 Spin_Buzz              0x002a  253  252  000    Old_age  Always      -      0

210 Unknown_Attribute      0x0032  253  252  000    Old_age  Always      -      0

211 Unknown_Attribute      0x0032  253  252  000    Old_age  Always      -      0

212 Unknown_Attribute      0x0032  253  252  000    Old_age  Always      -      0



SMART Error Log Version: 1

No Errors Logged



SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error      00%      1954        -

# 2  Extended offline    Aborted by host              40%      1953        -

# 3  Extended offline    Completed without error      00%      1685        -

# 4  Short offline      Aborted by host              50%      1683        -



SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.



=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".

Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.

Testing has begun.

Please wait 93 minutes for test to complete.

Test will complete after Wed Jun  6 15:40:16 2007



Use smartctl -X to abort test.

Does this mean that my disk is ok and that the error must be due to something else? Or are there some other tests that I should run on the disk?

BTW, I wasn't sure how to capture the log for the "extended test". Any hints on that?

Thanks so much!

From your response above:

Code:

Self-test execution status:      ( 249) Self-test routine in progress...

                                        90% of test remaining.

Total time to complete Offline 

data collection:                (1742) seconds.

...

Please wait 93 minutes for test to complete.

Test will complete after Wed Jun  6 15:40:16 2007