LinuxQuestions.org - Can't understand CPU utilization during md resync

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - Can't understand CPU utilization during md resync (https://www.linuxquestions.org/questions/linux-server-73/cant-understand-cpu-utilization-during-md-resync-4175494422/)

Can't understand CPU utilization during md resync

I just built a new server with 2x 2TB SATA disks. Mainboard has an Atom D525 dual core installed. 4GB memory I have installed 32-bits Debian Wheezy.

This is the output of the top command:

Code:

jlinkels@homeserv:~$ top

top - 09:29:20 up 14 min,  2 users,  load average: 1.14, 1.01, 0.63

Tasks: 104 total,  1 running, 103 sleeping,  0 stopped,  0 zombie

%Cpu0  :  0.0 us,  0.3 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu1  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu2  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

%Cpu3  :  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

And this is the output of /proc/mdstat:

Code:

md2 : active raid1 sda6[0] sdb6[1]

      1842053952 blocks super 1.2 [2/2] [UU]

      [>....................]  resync =  0.0% (1047936/1842053952) finish=29579.1min speed=1036K/sec

      

md1 : active raid1 sda5[0] sdb5[1]

      97589120 blocks super 1.2 [2/2] [UU]

        resync=DELAYED

      

md0 : active raid1 sda1[0] sdb1[1]

      9756544 blocks super 1.2 [2/2] [UU]

First I don't understand the average processor load. It is > 1.00 and all cores are > 99% idle?
Secondly, the sync speed is extremely low, 1MB/s

I know that an Atom is not exactly the fastest processor in the world, but 1 MB/s is really low. This is what I get on another Atom system copying 4GB from one partition to the next: (decimal separator inserted manually)

Code:

sent 4265657063 bytes  received 445 bytes  22,044,741.64 bytes/sec

total size is 4265134703  speedup is 1.00

jlinkels

Hello

About your first question, the load average indicates the number of processes in running state (i.e. waiting for cpu resource), you can find them like that :

Code:

ps -eo state,cmd | grep "^R"

However, a load average of 1 is extremely low, you should not worry about that.
Also, the raid building operations generates a lot of disk i/o, which are all handled by cpu0 if I remember. That could explain why you've got that little load average (because it's only waiting for cpu0 and no other core).

About the second question, I'm not an expert in that domain but I think that you can't compare these two speeds. The raid building speed is far more complex and the copying one is optimized when you are in raid 1... I don't think that these indicators are showing the real hardware speed.

Quote:

Originally Posted by akiuni (Post 5115394)

About your first question, the load average indicates the number of processes in running state (i.e. waiting for cpu resource), you can find them like that :

Code:

ps -eo state,cmd | grep "^R"

That doesn't give any indication.

Code:

jlinkels@homeserv:~$ ps -e -o state,cmd | grep "^R"

R ps -e -o state,cmd

jlinkels@homeserv:~$

Quote:

Originally Posted by akiuni (Post 5115394)

However, a load average of 1 is extremely low, you should not worry about that.

No it is high. My desktop has a load average of 0.04 or so. It is running KDE4 on an Atom.

Quote:

Originally Posted by akiuni (Post 5115394)

Also, the raid building operations generates a lot of disk i/o, which are all handled by cpu0 if I remember. That could explain why you've got that little load average (because it's only waiting for cpu0 and no other core).

If there is disk I/O blocking anything I should see more than 0.0% wa. And if CPU0 is busy, I do not expect to see 99.7% idle on that core. And 100% idle on all other cores.

Quote:

Originally Posted by akiuni (Post 5115394)

About the second question, I'm not an expert in that domain but I think that you can't compare these two speeds. The raid building speed is far more complex and the copying one is optimized when you are in raid 1... I don't think that these indicators are showing the real hardware speed.

Well, what about this when I switched on again the computer today to get the statistics for replying to this post:

Code:

root@homeserv:/home/jlinkels# cat /proc/mdstat

Personalities : [raid1] 

md2 : active raid1 sda6[0] sdb6[1]

      1842053952 blocks super 1.2 [2/2] [UU]

      [==>..................]  resync = 12.0% (221808192/1842053952) finish=638.4min speed=42294K/sec

      

md1 : active raid1 sda5[0] sdb5[1]

      97589120 blocks super 1.2 [2/2] [UU]

        resync=DELAYED

      

md0 : active raid1 sda1[0] sdb1[1]

      9756544 blocks super 1.2 [2/2] [UU]

Code:

And this is what top shows:

top - 14:13:33 up  1:35,  2 users,  load average: 0.71, 0.69, 0.68

Tasks: 108 total,  1 running, 107 sleeping,  0 stopped,  0 zombie

%Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 98.8 id,  0.7 wa,  0.0 hi,  0.0 si,  0.0 st

KiB Mem:  4137248 total,  150808 used,  3986440 free,    20076 buffers

KiB Swap:  3906556 total,        0 used,  3906556 free,    62680 cached



  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND

 1403 root      20  0    0    0    0 D  1.7  0.0  2:04.36 md2_resync

  231 root      20  0    0    0    0 S  1.3  0.0  1:03.14 md2_raid1

 3024 jlinkels  20  0  9796 1536  904 S  0.7  0.0  0:00.46 sshd

10797 root      20  0  4496 1320 1020 R  0.7  0.0  0:00.03 top

Not that I fully understand why I still have 98.9 idle time and an avergae load of 0.7. But the resync speed is what I expect.

It doesn't mean that I understand what was going on when I started this thread. I am happy I pasted the /proc/mdstat in here so I am sure I am not insane.

jlinkels

Quote:

Originally Posted by jlinkels (Post 5115704)

That doesn't give any indication.
No it is high. My desktop has a load average of 0.04 or so. It is running KDE4 on an Atom.

Well, I'm not used to laptops/desktops, but on servers the LA usually starts to be disturbing from around 15 and often turns into a crash from around 30-40. Depending on your machine power of course and the kind of load. But anyway, that's not the point if I understand.

Quote:

Originally Posted by jlinkels (Post 5115704)

But the resync speed is what I expect.

Have you tried this ? --> http://www.linuxhowtos.org/Tips%20an...id1speedup.htm

loadavg has little to do with CPU% - it is a decaying moving average of running and runnable processes. Adjust the command line offered above to search for [RD] (egrep).
It is also unnormalised, so is unaffected by the number of cores - a single looping/busy task in an otherwise idle system will drive the loadavg toward 1 regardless of the number of cores.
iowait is also similarly misunderstood by most - searching here on LQ (and elsewhere of course) will bring many and varied theories. But basically it doesn't mean the whole box has ground to a halt. Unless you have a single core/CPU, and a (userland) process (really) busy with I/O. And nothing else.
Essentially a useless metric these days. But then if you look at the code for loadavg, you might be inclined to say the same - using blocked rather than uiniterruptible makes much more sense IMHO.

akiuni: I am still puzzled. The next time I started the machine, the resync speed was going up and down between 7 MB/s and 20 MB/s. When I put the value 40,000 in the speed_limit_min, the resync speed showed a sustained 40+MB/s.

The processor load avg and idle times remain the same regardless of these settings. Only the disk light flashes more often when the resync speed is higher.

syg00: I know, the load average significance is among the most asked questions on Linux forums. Still I have the experience that the load avg is increasing when user and sys processes are over 10% or so. All processor cores idling at > 99.8% and a load avg of 1.0 I have not seen before.

Running egrep with "R|D" didn't show many more runnable processes.

Code:

root@homeserv:/proc/sys/dev/raid# ps -e -o state,cmd | egrep "^R|^D"

D [md2_resync]

R ps -e -o state,cmd

root@homeserv:/proc/sys/dev/raid#

jlinkels

Yeah, but that state "D" kernel thread contributes to the loadavg count - and it is going to be either "R" or "D" all the time.
No, I can't expalin why it wouldn't cause the loadavg to be at least 1 all the time ... ;)