LinuxQuestions.org

LinuxQuestions.org (http://www.linuxquestions.org/questions/index.php)
-   Linux - Hardware (http://www.linuxquestions.org/questions/forumdisplay.php?f=18)
-   -   internal HDD r/w ~ 1MB/s (in Linux only) (http://www.linuxquestions.org/questions/showthread.php?t=4175444999)

captainralf 01-10-2013 09:17 AM

internal HDD r/w ~ 1MB/s (in Linux only)
 
1 Attachment(s)
Hi everyone,

I was trying to install Fedora on an older laptop which "only" took about 10h ;)
Anyway, at first I thought it may be a Fedora issue and tried to got advice on the fedora forum. Unfortunately it seems that this is not Fedora related as I have the exact same issue with Linux Mint. The previous Win install was fast and HDD access rates are normal.

I've identified the HDD as the culprit for long install/ unresponsive system in general after looking at a few stats.

the system:
Sony Vaio A517B
Intel 915PM
Intel Pentium M 740 / 1.73 GHz
ATI Mobility Radeon X600
2GB RAM

lspci:
Code:

00:1f.1 IDE interface: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) IDE Controller (rev 03)
00:1f.2 IDE interface: Intel Corporation 82801FBM (ICH6M) SATA Controller (rev 03)

linux Mint kernel version: 3.5.0-17 not sure about the one in Fedora - takes rather long to boot...

smart output for the HD:
smartctl -a /dev/sda2:
Code:

smartctl 5.43 2012-06-30 r3573 [i686-linux-3.5.0-17-generic] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Travelstar 5K100
Device Model:    HTS541080G9SA00
Serial Number:    MPBDL0X6H43YWM
Firmware Version: MB4OC60D
User Capacity:    80,026,361,856 bytes [80.0 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 1
Local Time is:    Thu Jan 10 13:37:24 2013 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)        Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0)        The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  645) seconds.
Offline data collection
capabilities:                          (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003)        Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01)        Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:          (  2) minutes.
Extended self-test routine
recommended polling time:          (  55) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  100  100  062    Pre-fail  Always      -      0
  2 Throughput_Performance  0x0005  100  100  040    Pre-fail  Offline      -      0
  3 Spin_Up_Time            0x0007  253  253  033    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0012  098  098  000    Old_age  Always      -      4491
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  100  100  040    Pre-fail  Offline      -      0
  9 Power_On_Hours          0x0012  083  083  000    Old_age  Always      -      7722
 10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
 12 Power_Cycle_Count      0x0032  099  099  000    Old_age  Always      -      3061
191 G-Sense_Error_Rate      0x000a  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      105
193 Load_Cycle_Count        0x0012  079  079  000    Old_age  Always      -      217167
194 Temperature_Celsius    0x0002  141  141  000    Old_age  Always      -      39 (Min/Max 17/56)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  253  000    Old_age  Always      -      0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Warning! SMART Selective Self-Test Log Structure error: invalid SMART checksum.
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

hdparm output for HDD:
hdparm -v /dev/sda2:
Code:

/dev/sda2:
 multcount    = 16 (on)
 IO_support    =  1 (32-bit)
 readonly      =  0 (off)
 readahead    = 256 (on)
 geometry      = 9729/255/63, sectors = 8384512, start = 40960000

hdparm -t /dev/sda2
Code:

/dev/sda2:
 Timing buffered disk reads:  4 MB in  3.08 seconds =  1.30 MB/sec

I also noticed when running the mint live-CD that the internet connection goes down during HDD r/w operation, while the system was usable in general
ping shows:
Code:

ping: sendmsg: No buffer space available
on the installed Fedora(fresh 17 install) system on the other hand, the system got unresponsive for several seconds in regular intervals during HDD operations.

Please let me know if you need any additional information. I attached the dmesg from the linux mint live right after boot... there were no dmesg coming up after that.

I hope someone can help me out!

Cheers

TobiSGD 01-10-2013 09:28 AM

There seems to be something wrong with that machine, I would recommend to run some basic tests: Memtest86+, the disk manufacturer's diagnosis tool, also have a look at the temperatures of that machine.

captainralf 01-10-2013 09:35 AM

why do you think it's the machine?

It runs fine using windows... also I forgot to mention, attached USB disks work fine under Linux.

TobiSGD 01-10-2013 09:53 AM

Windows is not Linux, they handle hardware different, so it may be that Windows (especially with manufacturer proved drivers) has a different way to handle the errors you are seeing in your dmesg log.
From my experience, Linux is more prone to act weird when hardware failures occur than Windows.
So to rule out hardware errors I recommended to make some basic hardware tests first. If the tests are passed successfully we know for sure that it must be a software issue.

business_kid 01-10-2013 01:34 PM

A hidden issue in installing distros like Fedora is that they have to sit down and figure out the dependencies of each rpm and check to see if you have or are going to install them. The smaller your memory, the longer that takes.
Quote:

/dev/sda2:
Timing buffered disk reads: 4 MB in 3.08 seconds = 1.30 MB/sec
In a case with a disc that slow, I have always compiled a kernel with the correct chipset driver compiled in and the generic driver excluded. What is inclined to happen is that generic is tried first, and you get a message
Quote:

pci not 100% native, will probe irqs later
But then no other chipset driver can have the ide, because 'generic' has it. This can give a huge improvement. This sort of thing can produce bottlenecks, so I wouldn't worry about the net going down on a live cd. It does mean you'll probably have to install overnight on the slow disk, and then build the kernel to get your improvement.

captainralf 01-10-2013 03:16 PM

Thanks for the explanation Tobi.

would you mind letting me know what in the dmesg made you think it could be a hardware issue? - just for future reference :)

I did the memtest and an extensive HDD test with manufacturer tool, both reported no errors.
Had a look at temps and they were all pretty average.

There was another suggestion on the Fedora forum to do a fsck check.
That one gave back "Bad magic number" which wasn't fixable with backup superblocks, but only by partitioning the drive again - gparted for some reason resulted in the same error, so I used some program of Hiren's boot cd...
Anyway, after that fsck was fine, nevertheless the error remained the same :(


@business_kid:
uhmm... I did built a kernel ONCE and I had a step-by-step tutorial. While I'm sure there'll be a good tutorial to compile the kernel, I'm not sure how I'd have to change the parameters to get rid of this issue - that is what I'd have to do right? also, is there a way for me not to have to do another 10h install before building the kernel?

Thanks both for your input and I appreciate any additional input :)
cheers

TobiSGD 01-10-2013 04:35 PM

Quote:

Originally Posted by captainralf (Post 4867292)
Thanks for the explanation Tobi.

would you mind letting me know what in the dmesg made you think it could be a hardware issue? - just for future reference :)

I thought about these:
Code:

[    1.364376] irq 17: nobody cared (try booting with the "irqpoll" option)
[    1.364380] Pid: 0, comm: swapper/0 Not tainted 3.5.0-17-generic #28-Ubuntu
[    1.364383] Call Trace:
[    1.364393]  [<c10c6ae9>] __report_bad_irq+0x29/0xd0
[    1.364397]  [<c10c6db5>] note_interrupt+0x175/0x1c0
[    1.364403]  [<c15c8fb5>] ? _raw_spin_unlock_irqrestore+0x15/0x20
[    1.364409]  [<c10c4bdf>] handle_irq_event_percpu+0x9f/0x1d0
[    1.364413]  [<c10c4d4b>] handle_irq_event+0x3b/0x60
[    1.364417]  [<c10c7710>] ? unmask_irq+0x30/0x30
[    1.364420]  [<c10c775e>] handle_fasteoi_irq+0x4e/0xd0
[    1.364422]  <IRQ>  [<c15d0692>] ? do_IRQ+0x42/0xc0
[    1.364431]  [<c15d04f0>] ? common_interrupt+0x30/0x38
[    1.364437]  [<c106007b>] ? worker_rebind_fn+0x5b/0xc0
[    1.364441]  [<c15c8fb5>] ? _raw_spin_unlock_irqrestore+0x15/0x20
[    1.364446]  [<c109669d>] ? clockevents_notify+0x3d/0x110
[    1.364453]  [<c134d301>] ? lapic_timer_state_broadcast+0x36/0x39
[    1.364457]  [<c134d4c1>] ? acpi_idle_enter_simple+0x135/0x152
[    1.364463]  [<c149b7b5>] ? cpuidle_enter+0x15/0x20
[    1.364466]  [<c149bd18>] ? cpuidle_idle_call+0x88/0x220
[    1.364471]  [<c101870a>] ? cpu_idle+0xaa/0xe0
[    1.364477]  [<c159f715>] ? rest_init+0x5d/0x68
[    1.364482]  [<c18ae9be>] ? start_kernel+0x35d/0x363
[    1.364486]  [<c18ae4ed>] ? do_early_param+0x80/0x80
[    1.364490]  [<c18ae303>] ? i386_start_kernel+0xa6/0xad
[    1.364492] handlers:
[    1.364497] [<c140e380>] ahci_interrupt
[    1.364499] Disabling IRQ #17

Code:

[  320.499222] irq 17: nobody cared (try booting with the "irqpoll" option)
[  320.499230] Pid: 1548, comm: sh Not tainted 3.5.0-17-generic #28-Ubuntu
[  320.499232] Call Trace:
[  320.499242]  [<c10c6ae9>] __report_bad_irq+0x29/0xd0
[  320.499246]  [<c10c6db5>] note_interrupt+0x175/0x1c0
[  320.499252]  [<c12dfe92>] ? lzma_main+0x8c2/0xa70
[  320.499258]  [<c10c4bdf>] handle_irq_event_percpu+0x9f/0x1d0
[  320.499262]  [<c10c4d4b>] handle_irq_event+0x3b/0x60
[  320.499266]  [<c10c7710>] ? unmask_irq+0x30/0x30
[  320.499269]  [<c10c775e>] handle_fasteoi_irq+0x4e/0xd0
[  320.499271]  <IRQ>  [<c15d0692>] ? do_IRQ+0x42/0xc0
[  320.499280]  [<c15d04f0>] ? common_interrupt+0x30/0x38
[  320.499284]  [<c15d04f0>] ? common_interrupt+0x30/0x38
[  320.499289]  [<c12dfe92>] ? lzma_main+0x8c2/0xa70
[  320.499293]  [<c12e05e2>] ? xz_dec_lzma2_run+0x5a2/0x7c0
[  320.499297]  [<c12deead>] ? xz_dec_run+0x50d/0x950
[  320.499316]  [<f853a355>] ? squashfs_xz_uncompress+0x85/0x200 [squashfs]
[  320.499323]  [<f8536424>] ? squashfs_read_data+0x424/0x5d0 [squashfs]
[  320.499327]  [<c15d04f0>] ? common_interrupt+0x30/0x38
[  320.499332]  [<f8536856>] ? squashfs_cache_get+0x286/0x330 [squashfs]
[  320.499339]  [<f8536dc0>] ? squashfs_read_metadata+0x60/0xf0 [squashfs]
[  320.499344]  [<f8538bb8>] ? squashfs_lookup+0x248/0x3f0 [squashfs]
[  320.499348]  [<c15d04f0>] ? common_interrupt+0x30/0x38
[  320.499353]  [<c15c8f7d>] ? _raw_spin_lock+0xd/0x10
[  320.499358]  [<c1163961>] ? d_alloc+0x51/0x70
[  320.499363]  [<c1157e96>] ? __lookup_hash+0x46/0xe0
[  320.499367]  [<c1158a4c>] ? lookup_one_len+0x9c/0xd0
[  320.499374]  [<f85229c0>] ? ovl_lookup+0x160/0x390 [overlayfs]
[  320.499378]  [<c11636e4>] ? __d_alloc+0xf4/0x120
[  320.499381]  [<c1163961>] ? d_alloc+0x51/0x70
[  320.499386]  [<c1157e96>] ? __lookup_hash+0x46/0xe0
[  320.499390]  [<c15c19cf>] ? lookup_slow+0x36/0x8a
[  320.499394]  [<c115b6af>] ? link_path_walk+0x71f/0x770
[  320.499398]  [<c1163961>] ? d_alloc+0x51/0x70
[  320.499404]  [<c106d356>] ? lg_local_lock+0x16/0x20
[  320.499408]  [<c115b820>] ? path_lookupat+0x50/0x640
[  320.499412]  [<c15d04f0>] ? common_interrupt+0x30/0x38
[  320.499416]  [<c115806a>] ? getname_flags+0x2a/0xd0
[  320.499420]  [<c115be3a>] ? do_path_lookup+0x2a/0xb0
[  320.499424]  [<c115c786>] ? user_path_at_empty+0x46/0x80
[  320.499429]  [<c115c7df>] ? user_path_at+0x1f/0x30
[  320.499432]  [<c11530d2>] ? vfs_fstatat+0x42/0x80
[  320.499436]  [<c1153130>] ? vfs_lstat+0x20/0x30
[  320.499440]  [<c1153506>] ? sys_lstat64+0x16/0x30
[  320.499443]  [<c10c7773>] ? handle_fasteoi_irq+0x63/0xd0
[  320.499448]  [<c104ce1c>] ? irq_exit+0x5c/0xa0
[  320.499452]  [<c15d069b>] ? do_IRQ+0x4b/0xc0
[  320.499456]  [<c15cff5f>] ? sysenter_do_call+0x12/0x28
[  320.499460]  [<c15c007b>] ? perf_cgroup_mark_enabled.isra.15+0x2e/0x95
[  320.499462] handlers:
[  320.499467] [<c140e380>] ahci_interrupt
[  320.499478] [<f868a470>] rtl8169_interrupt [r8169]
[  320.499481] Disabling IRQ #17


captainralf 01-10-2013 04:42 PM

ok thanks - I saw those but didn't have a clue what it was supposed to tell me ;)

Do you have any additional input what might be worthwhile to attempt?

TobiSGD 01-10-2013 06:33 PM

As the error messages suggest, it may help to boot with the irqpoll kernel option.

captainralf 01-10-2013 07:20 PM

1 Attachment(s)
yes... please pardon my ignorance :redface:

so, I just tried that with the Mint live cd. Unfortunately nothing changed and I should have done it correctly as I could find it in the Kernel command line in the dmesg output. On the other hand, it still suggests to use irqpoll...
Please have a look at the attached dmesg.


All times are GMT -5. The time now is 05:17 PM.