Performance issues in updating from Centos 7.2 to 7.4; plus performance monitoring issues!

alex3857 · 11-17-2017, 08:53 PM

Hi. I have a 'general' question about my server's performance and a more detailed couple of questions about how the kernel works in terms of swapping and memory management.

I have an Amazon AWS EC2 virtual machine (a T2.medium instance) with two VCPUs and 4GB of memory. The machine is running a LAMP stack of web sites - Apache 2.4 httpd, PHP-FPM and Mariadb. Plus I have an old copy of Firefox running headless to a VNC server being used to generate PDF files (it works!).

This machine was running with Centos 7.2 for 1.5 years. The 8GB swap device would creep up in utilisation over the months to something like 2GB used, but the system was stable and performed well with very little swapping.

Until I did a 'mass update' of Centos two weeks ago. I did a simple 'yum update', stood back, waited 15 minutes, rebooted, everything worked (other than having to remove (again) a couple of default configuration files that got reinstalled). Great.

But we soon started seeing very poor performance. Mariadb was reporting lots of queries to its 'slow log' (SQL statements taking more than 10 seconds). My analysis of the Apache httpd access_log showed that many web pages were taking 2-3 times to run than usual. lfd was sending me messages multiple times a day telling me about a high load average (6 - 9+).

iostat showed me that the disk is 100% utilised during these patches of slow performance, pegging out at 62.5MB/s throughput to the sole (Amazon EBS) disk volume. This is *exactly* the peak throughput I'd seen over the past year when the disk was running flat out (for backups and such). Nothing on the (virtual) hardware side of things has changed at all; just my doing a 'yum update all'.

The periods of bad performance come and go, once every half hour or more frequent. I think this *might* coincide with when the Firefox process kicks off (every 15 minutes) to do its PDF runs. The mariadb process would typically take 1.5GB - 2GB of virtual memory, Firefox roughly 1GB.

The Amazon performance graphs show that, while the maximum disk throughput limit is unaltered at 62.5MB/s, there's a lot more disk I/O going on these past two weeks than all of the months before.

At the moment I'm assuming that the system is simply swapping heavily, thrashing, at these times of poor performance. I'll discuss that further with my 'detailed' questions about my performance monitoring results below. Or it could be something that's just demanding more disk activity, and maybe swapping isn't actually the issue. Either way, I don't know why this happened immediately after a simple 'yum update' from Centos 7.2 to 7.4.1708. I don't know if a Linux kernel parameter has changed - swappiness? Disk readahead? Disk scheduling? Something else? Or if mariadb has changed its memory use somehow. The Firefox I'm running is an old version; that didn't change, but maybe linked libraries did?

So my general question is - does anyone know of any changes in memory use, any change in memory footprint, any kernel disk/memory changes, in going from Centos 7.2 (a couple of years old) to the current 7.4.1708? Has anyone had any problems like this? Any ideas?

I'd also appreciate any help in understanding the output of a couple of the Linux performance monitoring tools. When the system is slow everything points to the prime reason being disk activity - top says that 80% - 100% of the time is spent in I/O 'wait', iostat reports that the disk is 100% utilised. Here's the 60-second iostat output for a situation this morning when when the machine slowed to a crawl for just over a minute, at 11:56:

Code:

11/18/2017 11:56:04 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.29    0.00    6.92   86.47    0.24    2.08

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda             26.70   115.22 1140.21    7.54 64626.80   510.73   113.50   107.98   93.86   93.90   88.09   0.89 101.89
xvdap1            9.97     0.56 1125.92    3.42 64485.09    35.62   114.26   106.08   93.71   93.89   31.67   0.90 101.89
xvdap2           16.73   114.66   14.29    4.12   141.71   475.11    67.00     1.91  103.24   94.10  134.95  34.55  63.61

The system is spending 86% of its time in I/O wait, and the disk is 100%+ utilised. So the primary symptom of the problem is that of extreme disk activity. Which makes me automatically think that the system is heavily swapping, as I know from experience that Linux performance dives when it starts swapping badly.

But note that the swap device itself - xvdap2 is barely being used. Almost all of the throughput is going through the sole root disk partition/mount. The *utilisation* of the swap device is high because its I/O is being queued with all of the other requests, but it only read & wrote 600KB/second.

vmstat output is consistent with that of iostat:

Code:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- -----timestamp-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st                AEST
 0 10 1072284 114196      0 461412   64    9 33366   425 2034  764  8  2 53 37  0 2017-11-18 11:55:20
 1  0 1153372 118272      0 523608  180 1398 53783  1534 2165  985 12  8  8 72  0 2017-11-18 11:56:20

vmstat didn't run at exactly the same time as iostat for its 60-second sample so I include the two lines of output that showed the heavy system activity.

Here the bi & bo reflect pretty much what iostat told us - throughput close to the 62.5MB/s limit of the disk volume (in other tests the bi/bo is practically identical to that reported by iostat). Wait time is near the 80% with the second line, which is where vmstat saw most of the performance hit.

And the si/so - swap in/out - is very low, again consistent with iostat.

So is the system swapping, or is there something else to blame for the periodic increase in heavy disk I/O and associated abrupt loss of system performance?

I have the output of one other system utility to present - I also have iotop running to see what processes are using up the disk. Here are the first few lines of the iotop output around this time, sorted on disk read througput:

Code:

11:56:05 Total DISK READ :      49.51 M/s | Total DISK WRITE :      75.94 K/s
11:56:05 Actual DISK READ:      63.16 M/s | Actual DISK WRITE:    1384.80 K/s
    TIME  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND
11:56:05  5406 be/4 psrv       28.45 M/s   62.89 K/s  0.00 %  3.63 % ./firefox/firefox http://pdf.php
11:56:05  5274 be/4 mysql       3.74 M/s   12.19 K/s  0.00 %  1.52 % mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=/var/log/mariadb/mariadb.log --pid-file=/var/ru
n/mariadb/mariadb.pid --socket=/var/lib/mysql/mysql.sock
11:56:05 13187 be/4 userxxxx 1908.05 K/s    0.00 B/s  0.00 % 20.44 % php-fpm: pool xxxxxxxx
11:56:05  1530 be/4 apache   1657.59 K/s   62.71 B/s  0.00 % 59.76 % httpd -DFOREGROUND

My first 'what is Linux doing?' question is - is the system swapping or not?

My second is "just what is iotop doing?".

Notice that the 'Total DISK READ' is 49MB/s while the 'Actual DISK READ' is 63MB/sec, the top limit of the disk's throughput. The man page for iotop says that these two figures "may not be equal at any given moment of time due to data caching and I/O operations reordering that take place inside Linux kernel".

During all of these periods of poor system performance the ACTUAL is normally right at the limit of 62MB/s whereas the TOTAL is usually much less, sometimes only a third, around 20MB/s.

The other thing to note is that the Firefox process is using up half of the disk's throughput - almost 30MB/s - but mariadb is using a measly 4MB/s, and nothing else is using the disk much at all.

I'll offer my thoughts on what's happen and then beg for help from the floor. :-)

I think the system is swapping. But I think most of the data being shovelled on and off the disk is program *text* - the binary code - and not program *data*. So the I/O isn't going to the swap device at all, since program text isn't written to the swap device, as the binaries are always present on the root volume.

Is that reasonable? Are Firefox and mariadb's mysqld - the two big players here - pretty big, in terms of code size?

So I think that's why iostat shows that the actual swap device is only being lightly used. And I'm *guessing* that the si/so metrics of vmstat refer likewise to data being moved on and off the *swap device*, rather than reflecting actual kernel *swapping activity*. (The man page says 'swapped in/to disk', but doesn't say *what* disk.) Something I've never come across until now.

And I'm assuming that - in my iotop example above - there was 14MB/s of disk activity, the difference betwee the TOTAL and ACTUAL disk figures - and usually that difference in other observerations is more like 20MB/s - 30MB/s - that was swapping overhead ... to the root volume, not the swap device.

So, I would very much welcome any ideas. Obviously I'd love it if someone could say, well, yes, there is a well-know issue in going from Centos 7.2 to 7.4, Mariadb uses memory more aggressively, the kernel does more disk I/O, scheduler changed, something, to solve my primary problem:

Q1. Does anyone know of any changes going from Centos 7.2 to 7.4 that would cause a system to suddenly start incurring twice the amount of disk I/O and suffer severe intermittent performance degradation? (And is there a fix?)

And on the kernel side I'd appreciate any advice on the other four secondary questions:

Q2. Is my system swapping?

Q3. Does swapping program text get reported by vmstat, or is it 'invisible' to the vmstat si/so metrics? Is it reasonable to think that most of my swap activity is program text/code rather than data to the swap device?

Q4. Is swapping activity a well-known reason for the difference between the TOTAL and ACTUAL disk throughput reported by iotop?

Q5. Is there any other metric in Linux that shows swapping *activity* rather than disk I/O to the *swap device*? So I can know for sure whether Linux is madly swapping code/text pages to/from the root volume, even if vmstat and iostat say there is no activity with the swap device?

Thanks for any help!

syg00 · 11-19-2017, 10:02 PM

Nothing is ever as simple as you might imagine - memory management especially. Post the contents of /proc/meminfo for an inkling of what I mean. As a totally blind guess I'd reckon that hugepages are now in use by someone - probably the DB. These are "pinned" in memory and reduce the pool of available page frames that every one else can use - for file mapped I/O or swap.
Which can of course lead to "thrashing" - which again isn't limited to swapping. If a page is reclaimed (and used by the new task) and the original contents are immediately again referenced, the data that just got paged out gets paged in again. Classic thrashing.
I'm not sure any tool will show it properly other than a specific kernel trace.

Q1 - check the RHEL release notes
Q2 - if si/so are non-zero you are swapping. End of discussion.
Q3 - swap is only used for anonymous memory. the userspace tools all simply report kernel stats. There are no "hidden" numbers.
Q4 - I/O is I/O once it reaches the driver, doesn't matter where it came from, although (I/O) priority may be applied.
Q5 - there is no other tool I know of that will convince you of the veracity of the numbers reported.

alex3857 · 11-21-2017, 06:23 AM

Quote:

Originally Posted by syg00

Post the contents of /proc/meminfo for an inkling of what I mean.

I've been forced to up-size the system to have double the memory - from 4GB to 8GB - which has totally eliminated the disk bottleneck/swapping issue. But I'd still like to know what caused it.

This is the contents of /proc/meminfo on exactly the same system, same software, same applications, same load, just running with 8GB rather than 4GB:

Code:

MemTotal:        8009180 kB
MemFree:          145400 kB
MemAvailable:     795552 kB
Buffers:               0 kB
Cached:           535936 kB
SwapCached:        33720 kB
Active:          5631884 kB
Inactive:        1349504 kB
Active(anon):    5273344 kB
Inactive(anon):  1244488 kB
Active(file):     358540 kB
Inactive(file):   105016 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       8388604 kB
SwapFree:        8289104 kB
Dirty:                60 kB
Writeback:             0 kB
AnonPages:       6422348 kB
Mapped:            57624 kB
Shmem:             72380 kB
Slab:             637684 kB
SReclaimable:     489484 kB
SUnreclaim:       148200 kB
KernelStack:        5904 kB
PageTables:        93828 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12393192 kB
Committed_AS:    8550700 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       27220 kB
VmallocChunk:   34359705340 kB
HardwareCorrupted:     0 kB
AnonHugePages:   1538048 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      126976 kB
DirectMap2M:     8261632 kB

Quote:

As a totally blind guess I'd reckon that hugepages are now in use by someone - probably the DB. These are "pinned" in memory and reduce the pool of available page frames that every one else can use - for file mapped I/O or swap.

Which can of course lead to "thrashing" - which again isn't limited to swapping. If a page is reclaimed (and used by the new task) and the original contents are immediately again referenced, the data that just got paged out gets paged in again. Classic thrashing.

I wish huge pages were being used, that would have been a nice solution to my quest as to what changed, but the system shows that only Transparent Huge Pages (THP) are being used, and they are swappable:

Code:

# sysctl -a | fgrep -i hugepages
vm.nr_hugepages = 0
vm.nr_hugepages_mempolicy = 0
vm.nr_overcommit_hugepages = 0
# fgrep -i huge /proc/meminfo
AnonHugePages:   1538048 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
# cat /sys/kernel/mm/transparent_hugepage/enabled 
[always] madvise never

Thank you for the suggestion about huge pages, some nice reading to discover what they are.

As you can see from the above 'explict' huge pages are not in use. THP are being used, but they are swappable.

So I'm back to asking 'is the system swapping - even though vmstat & iostat say no - or else what is causing all of the disk activity?'.

Quote:

I'm not sure any tool will show it properly other than a specific kernel trace.

Sadly I don't have the skills to do that. I'm disappointed there doesn't seem to be a metric anywhere that can help me out.

Quote:

Q1. Does anyone know of any changes going from Centos 7.2 to 7.4 that would cause a system to suddenly start incurring twice the amount of disk I/O and suffer severe intermittent performance degradation? (And is there a fix?)

Q1 - check the RHEL release notes

I had a quick skim of the Centos 7 release notes, couldn't see anything pertinent.

Quote:

Q2. Is my system swapping?

Q2 - if si/so are non-zero you are swapping. End of discussion.

si/so are non-zero but are trivial in size, a couple of MB/s out of the 62MB/s thrashing observed on the disk. From the perspective of 'what is causing my system to bog down with a saturated disk?' si/so are saying "you are not swapping enough to cause this behaviour".

Quote:

Q3. Does swapping program text get reported by vmstat, or is it 'invisible' to the vmstat si/so metrics? Is it reasonable to think that most of my swap activity is program text/code rather than data to the swap device?

Q3 - swap is only used for anonymous memory. the userspace tools all simply report kernel stats. There are no "hidden" numbers.

My question is - do the userspace tools (essentially vmstat) report kernel stats of swapping activity *to the swap device only*. That's unclear, I think, from the documentation. And inconsistent if vmstat si/so is supposed to show swap to *all* disks, as iostat says otherwise.

Quote:

Q4. Is swapping activity a well-known reason for the difference between the TOTAL and ACTUAL disk throughput reported by iotop?

Q4 - I/O is I/O once it reaches the driver, doesn't matter where it came from, although (I/O) priority may be applied.

But iotop tries to tell 'where it comes from', and shows a huge amount of disk I/O for which it cannot account. If I knew where that I/O was coming from I would have a much better idea of what my problem is, so for me and this problem I think it does matter. iostat tells me the disk is saturated but iotop can't tell me what's to blame for it. Nothing seems to be able to do that.

Quote:

Q5. Is there any other metric in Linux that shows swapping *activity* rather than disk I/O to the *swap device*? So I can know for sure whether Linux is madly swapping code/text pages to/from the root volume, even if vmstat and iostat say there is no activity with the swap device?

Q5 - there is no other tool I know of that will convince you of the veracity of the numbers reported.

I vaguely recall something from my Solaris days that gave an indication of kernel swap activity, paging to/from disk - *any* disk - and I was hoping Linux might have something like that.