[SOLVED] openSUSE 13.1 has several non-interruptible 2 to 7 second delays every minute
SUSE / openSUSEThis Forum is for the discussion of Suse Linux.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
openSUSE 13.1 has several non-interruptible 2 to 7 second delays every minute
I have been doing all my development and desktop work with SUSE Linux for years and never had serious problems. 3 years ago I switched to openSUSE 13.1 (Bottle) (x86_64) using Linux 3.11.10-29-desktop and KDE 4.11.5. My hardware is a COMPAQ Presario CQ57 Laptop with a dual processor Intel(R) Pentium(R) CPU B940 @ 2.00GHz with 4 GByte of memory and a 300 GByte SATA disk, half of which is used for Linux. The root partition is 25 GByte (40% used). The Home partition is 117 GByte (14% used). 150 GByte is Windows XP, which I never (and now no longer can) use. I am loath to switch to Leap 42.1 or later because I am in the final stages of a large project and do not want the hassle of setting up and configuring a new environment. Also the reports on Leap 42.1 are not encouraging. The SUSE Software updater reports "Your system is up to date".
About 6 months ago I started noticing delays in echoing keystrokes of about 1 second. Over the months this has gradually increased to the current delays of up to 7 seconds. The delays are most noticeable with keystrokes, but they also slow down switching tabs on the Konsole or switching to another program. This makes working with the system very arduous and tacky.
In an attempt to locate where the delays are coming from I first noticed that the monitor program "xosview" showed frequent blocks of "WIO" activity on both CPU's, which roughly coincide with the keyboard and other blockages I was noticing. On http://www.chileoffshore.com/en/interesting-articles/126-linux-wait-io-problem> in an article Linux Wait IO Problem. The author (chile) points out: The main cause are those background processes with "D" status code which means "Uninterruptible sleep". Later he points out that the ext4 journal processes (jbd2) are the most likely culprits. This proved to be the case on my system, which I could nail by running the following script:
while true; do ps auxf | grep D | \
if grep -E "(jbd2\/sda\.*|kdmflush)"; then \
date; \
fi; sleep 1; \
done | tee jbd_21060912.log
The following is the output of jbd_21060912.log (with 3 columns removed and consecutive delays marked manually)
7 delays of 5 seconds on average in 3 minutes 32 seconds. This happens all the time. I chose the beginning of a run done just now with only Firefox running. The output is similar with no processes of mine running at all.
The solution is not simple.
(chile) points out: the reason of high WA is not always the same. But the solution will always on those processes which are with STAT as D. In this case, the configuration of "Journal Disk" should be reconsidered. If the server is a machine for development, it is not recommended to use Journal to protect the hard disk.
The problem I can see is, that reconfiguring Journalling can only be done by re-formatting the disk, which I definitely do not want to do - I have a lot of work on that disk and need to work on my project.
So my Linux Question is: what can I do to pin down and eradicate this continuous disc activity (most probably journalling - what!) and bring my system back to normal?
PS: I have not recently done a hardware test on the disk - any suggestions on the best non-destructive disk test would be most welcome. I did run Memtest 86+ v4.28 last night without error.
PPS: It could be that "xosview" is the culprit, although I have switched it off a few weeks ago and it has not made any difference.
Thanks for the quick reply. I was not aware of the 13.1 end of life issue - Guess I have to bite the bullet and upgrade. As far as misaligned sectors are concerned I will check with 'gparted' now.
Some thoughts/questions:
- if it worked ok 6 months ago, why would you think journalling is now the problem ?.
- tasks in "D" are generally victims (they're waiting for disk I/O completion), not cause. The author you quote does not understand task in uninterruptible sleep - they certainly do not consume CPU.
- your loop is very crude, try installing and running latencytop.
- F/F is a pig, and has been worse recently. I ran some kernel function traces on ext4 and F/F was the predominant caller. Try the following (as root) and see if it has any benefit
Code:
echo 1500 | sudo tee /proc/sys/vm/dirty_writeback_centisecs
This will slow down the rate at which I/O is forced out to disk (use 500 to set it back to default).
- run smartctl on your disk - you may be able to get to it from "disk" or similar in openSUSE.
apart from "swap" both Linux partitions are aligned on the 4096 boundary.
For what it is worth the two NTFS partitions for Windows are also aligned.
My concern is, why did this 13.1 distribution run flawlessly (as far as delays are concerned) for 2 1/2 years and then deteriorate in the way described after the system reaches its official "End of Life". One cannot help feeling that something was planted to force users to change. But why? I thought this only happened with closed shop software.
- if it worked ok 6 months ago, why would you think journalling is now the problem ?.
- tasks in "D" are generally victims (they're waiting for disk I/O completion), not cause.
Because there are real delays - the keyboard is fully blocked for up to 7 seconds - several times a minute. To my view as an engineer that is a sure sign of an uninterruptible sleep. The primary source of the problem is probably FireFox. In fact I switched off FireFox altogether for an hour while having dinner. There were only 80 D events in that hour. (Previously 205 in 3 1/2 minutes = 3,600/hour with FireFox)
Quote:
Try the following (as root) and see if it has any benefit
Code:
echo 1500 | sudo tee /proc/sys/vm/dirty_writeback_centisecs
This will slow down the rate at which I/O is forced out to disk (use 500 to set it back to default).
I have run your code snippet and then logged with my crude script for the last 20 minutes while editing this reply - it has only caused 24 D events = 72/hour. Definitely a vast improvement over 3,600/hour. Also no noticeable delays in typing. Thanks.
Just as a heads up on 13.1's EOL - it is now supported by Evergreen, so I think it's now such that you just still receive updates, just from the Evergreen community, rather than from SUSE. Just keep updating as you did before, although beware that the Evergreen support end will in another 2 months.
I'd be worried about that disk - smartctl will confirm that or not.
Me, I'd get a full backup and a new disk. Just in case.
Have installed openSUSE Leap42.1 on a new 250 GB SSD disk, after backing up the 2 Linux partitions and the Windows partition as tar balls. Restored all important data from those 3 backups to Leap42.1, abandoning Windows7 in the process. Leap42.1 running very smoothly. Ran my test script: while true; do ps auxf | grep D etc.
There was one D event in the next hour:
root 292 0.0 0.0 0 0 ? D 15:37 0:00 \_ [jbd2/sda2-8]
I guess one journalling event per hour is reasonable.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.