LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Tracking down which files are causing disk read/write spikes (https://www.linuxquestions.org/questions/linux-general-1/tracking-down-which-files-are-causing-disk-read-write-spikes-886548/)

kaligula1234 06-15-2011 12:13 PM

Tracking down which files are causing disk read/write spikes
 
Hi All,

Currently working on speeding up my company's database servers, some mysql inserts are taking over a second during peak times, which is about 500 times longer than they usually do. Been logging statistics of cpu load, ram and block io, seems only thing regularly causing a bottleneck in the disk access, which i suppose is to be expected.

In order to combat this problem what I'd ideally like to do is find out which tables are responsible for the most disk reading and writing, our system is incredibly poorly written and I suspect there will be a lot of places I can speed things up using memory tables, memcache, or just rewriting how that part of the system works.

I'm sure I've seen a linux command somewhere that would allow me to log the read and write stats to all open files at a particular time, it was something like lsof but it also had info on how much data was queued to be written or read from each file. If anyone can tell me what that might have been, or anything similar that would be very useful, as I could just apply it to the mysql table files.

Failing that I imagine there are other ways of doing this that I'm overlooking, perhaps even in mysql itself. Any suggestions would be welcome!

Thanks :)

kaligula1234 06-16-2011 04:07 AM

Am toying with the idea of taking a snapshot of queries over a minute with the query log, then parsing it to see how much data is being read and written to each table in each query. Seems excessive though, and might only be able to get it vaguely accurate.

Just feels like there must be a better way of doing it in linux, any ideas???

JZL240I-U 06-16-2011 09:05 AM

"hdparm" has a longish list of parameters. AFAIR there was some possible way also with "ps". Anything of use for you there? You could probably connect inodes to files or are your hard disks raw devices?

kaligula1234 06-17-2011 11:01 AM

Hi thanks for your suggestions, all of them sound feasible, I've had a good look at the ps and hdparam commands tho, and can't see a way of using either to gauge how much data is being written to or read from a particular file. Could you maybe be a bit more specific?

From what I understand about Inodes they are used by the filesystem to keep track of basic file details like last access time, type, permissions, that sort of thing. Nothing about how often the file is written to or read from. How could I use them?

Appreciate your help!

TB0ne 06-17-2011 11:12 AM

Quote:

Originally Posted by kaligula1234 (Post 4387315)
Am toying with the idea of taking a snapshot of queries over a minute with the query log, then parsing it to see how much data is being read and written to each table in each query. Seems excessive though, and might only be able to get it vaguely accurate.

Just feels like there must be a better way of doing it in linux, any ideas???

There are a couple of things I'd ask first. How big is the database, not only in physical size, but in the number of tables? How is it accessed? (web via PHP, Java, or 'fat' client??) What kind of disk is the database on now? Is it located on the same physical device as the operating system? On the same server as the (maybe) web server? What kind of network speed between the two?

As a general rule, it's best to keep your DB on its own server, and the database ITSELF on it's own storage media, preferably on its own controller, too. If you've got a big database, and its on the same server as the web server, and all running off one disk....your bottlneck is easy to spot. You can even spread a database across multiple spindles, so tables will be on their OWN drives, to really speed things up. Once you get the DB tuned, (and you say your app is poorly written), that would be the next place I'd look. A bad app can knock your performance down too, but DB tuning is the most clear piece of the puzzle to attack first.

Some things to help you get started.
http://forums.mysql.com/read.php?24,92131,92131
http://www.mysqlperformanceblog.com/...-installation/
http://dev.mysql.com/doc/refman/5.1/...imization.html

Also, I'd spring the $99 for the MySQL workbench tool...gives you lots of stats to help you.

kaligula1234 06-17-2011 12:18 PM

Quote:

Originally Posted by TB0ne (Post 4388706)
There are a couple of things I'd ask first. How big is the database, not only in physical size, but in the number of tables? How is it accessed? (web via PHP, Java, or 'fat' client??) What kind of disk is the database on now? Is it located on the same physical device as the operating system? On the same server as the (maybe) web server? What kind of network speed between the two?

About 50GB, hundreds if not thousands of tables, but the majority of the 50GB will be spread across only 30-40 tables.

It's on it's own server, accessed from separate webservers running php, on a Gb connection. I believe the the db is running on the same device and partition as the operating system, not really going to be able to change that unless it's likely to make a big different tho, as the bosses aren't fans of downtime. For the moment I'm looking for the best improvements I can get with little or no downtime, of which I am sure there are a few.

I don't know much about the type of disk it's on, I'm pretty sure it's just whatever rackspace give as standard on one of their 16GB cloud servers, I realise that using a cloud server could be part of the spikes also, as we are likely sharing a disk access with someone else, but not much I can do about that.

Quote:

Originally Posted by TB0ne (Post 4388706)
As a general rule, it's best to keep your DB on its own server, and the database ITSELF on it's own storage media, preferably on its own controller, too. If you've got a big database, and its on the same server as the web server, and all running off one disk....your bottlneck is easy to spot. You can even spread a database across multiple spindles, so tables will be on their OWN drives, to really speed things up. Once you get the DB tuned, (and you say your app is poorly written), that would be the next place I'd look. A bad app can knock your performance down too, but DB tuning is the most clear piece of the puzzle to attack first.

This sounds very interesting, could anything be done with just repartitioning the one drive here, or would I def need to have separate physical drives? Might be able to get network drives added fairly easily, would the network access time ruin any possible benefits there tho?

Again this is the sort of solution where it would be nice to have some idea of what tables were responsible for the bottlenecks tho, I'll read through the links you gave me tonight, but do any of them specifically give me a way of measuring that? Or the workbench tool?

Sorry if I'm asking any stupid questions here, my knowledge is a little patchy :S

Your help is much appreciated!

TB0ne 06-17-2011 12:43 PM

Quote:

Originally Posted by kaligula1234 (Post 4388733)
About 50GB, hundreds if not thousands of tables, but the majority of the 50GB will be spread across only 30-40 tables.
It's on it's own server, accessed from separate webservers running php, on a Gb connection. I believe the the db is running on the same device and partition as the operating system, not really going to be able to change that unless it's likely to make a big different tho, as the bosses aren't fans of downtime. For the moment I'm looking for the best improvements I can get with little or no downtime, of which I am sure there are a few.

I don't know much about the type of disk it's on, I'm pretty sure it's just whatever rackspace give as standard on one of their 16GB cloud servers, I realise that using a cloud server could be part of the spikes also, as we are likely sharing a disk access with someone else, but not much I can do about that.

Well, if uptime is important to the bosses, then they should already have a hot-standby copy of the DB on another server, right? If not, it's just a matter of time before one of two things happen: the DB server dies totally, and you're down for a LONG time, or the wait/lag gets too long, and you're down for a long time fixing it. Take the time now to solve the problems, and get yourself a safety net while you're at it. Get another server, copy the DB over, and tune it.
Quote:

This sounds very interesting, could anything be done with just repartitioning the one drive here, or would I def need to have separate physical drives? Might be able to get network drives added fairly easily, would the network access time ruin any possible benefits there tho?

Again this is the sort of solution where it would be nice to have some idea of what tables were responsible for the bottlenecks tho, I'll read through the links you gave me tonight, but do any of them specifically give me a way of measuring that? Or the workbench tool?
The workbench tool can give you some, and there are other methods in those links. If you're not going to get separate physical media, it's pointless. Disk IO is what you need to address...a separate partition won't do it. You're still on one disk channel, one drive, one set of heads, etc. Network drives? That's a big "it depends". If you're talking about NFS mounts or the like, then you'll be worse off. SAN disks...maybe, if you're going to make sure you're going 2GB fiber, rather than 1GB.

JZL240I-U 06-20-2011 01:15 AM

TB0ne's suggestions are much better than mine, please discontinue with what I suggested. Being no DB-guy, I just didn't want you to hang out with no answer at all and so I tried my meager best.

One more little thought: Does the whole installation use a lot of temp files (in /tmp)? If so, you might re-direct that to a RAM-disk and mount /tmp there -- provided you have that space. Or use an SSD for it.


All times are GMT -5. The time now is 05:18 AM.