Hi
we do have a similar problem?
We are running redHat AS with a 2.4.x Kernel on HP Blade in SMP Mode with 2 Processors and 4GB Ram each. The blades are connected to a HP SAN via a QLogic adapter.
We are running a big search engine (FAST) on the machines. The nodes we have problems with are the crawler nodes. The iowait is very high there and the document crawl rate is very low, cpu usage is low and the memory consumption is very ok.
I have done the things mentioned in the post before:
iostat -x 1 showed me that
Code:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
/dev/sda 24.05 827.85 56.96 391.14 648.10 9772.15 324.05 4886.08 23.25 58.35 12.94 1.10 49.37
/dev/sda1 24.05 827.85 56.96 391.14 648.10 9772.15 324.05 4886.08 23.25 58.35 12.94 1.10 49.37
which tells mee that the most frequently used devices are the mounted san devices which are /dev/sda and /dev/sda1
then i executed the mentioned "fuser -vm /dev/sda1" command
which returned the output below, what showed me that the processes
writing to the san are all from our search engine.
Code:
USER PID ACCESS COMMAND
/dev/sda1 fastusr 16212 f.... cachemanager
fastusr 16218 f.... frtsobj
fastusr 16266 f.... statusserver
fastusr 16283 f.... fsearch
fastusr 16291 f.... fsearch
fastusr 16299 f.... fsearch
fastusr 16306 f.... anchorserver
fastusr 16355 f.c.. mysqld
fastusr 16356 f.c.. mysqld
fastusr 16357 f.c.. mysqld
fastusr 16358 f.c.. mysqld
fastusr 16359 f.c.. mysqld
fastusr 16360 f.c.. mysqld
fastusr 16363 f.c.. mysqld
fastusr 16364 f.c.. mysqld
fastusr 16365 f.c.. mysqld
fastusr 16366 f.c.. mysqld
fastusr 16647 f.... crawler
fastusr 16650 f.... crawlerfs
fastusr 16652 f.... uberslave
fastusr 16654 f.... uberslave
fastusr 16655 f.... uberslave
fastusr 17083 f.... uberslave
fastusr 17140 f.... uberslave
fastusr 17142 f.... uberslave
fastusr 17200 f.... postprocess
fastusr 17202 f.... uberslave
fastusr 21036 f.c.. mysqld
fastusr 21421 f.c.. mysqld
fastusr 21422 f.c.. mysqld
fastusr 21423 f.c.. mysqld
fastusr 21424 f.c.. mysqld
Than I had a look into dmesg for file system errors and got the following output:
Code:
EXT3-fs error (device sd(8,1)): ext3_readdir: bad entry in directory #2670799: rec_len % 4 != 0 - offset=0, inode=926102069, rec_len=13874, name_len=10
EXT3-fs warning (device sd(8,1)): empty_dir: bad directory (dir #2670799) - no `.' or `..'
EXT3-fs error (device sd(8,1)): ext3_free_blocks: bit already cleared for block 5342380
EXT3-fs error (device sd(8,1)): ext3_free_blocks: bit already cleared for block 6750760
EXT3-fs error (device sd(8,1)): ext3_free_blocks: bit already cleared for block 4654459
EXT3-fs error (device sd(8,1)): ext3_free_blocks: bit already cleared for block 4481402
EXT3-fs error (device sd(8,1)): ext3_readdir: bad entry in directory #2588774: rec_len % 4 != 0 - offset=0, inode=1702129263, rec_len=29806, name_len=45
EXT3-fs warning (device sd(8,1)): empty_dir: bad directory (dir #2588774) - no `.' or `..'
EXT3-fs error (device sd(8,1)): ext3_free_blocks: bit already cleared for block 5178169
EXT3-fs error (device sd(8,1)): ext3_free_blocks: bit already cleared for block 4358984
EXT3-fs error (device sd(8,1)): ext3_free_blocks: bit already cleared for block 2753246
(this is just a bit of the errors I got)
No errors in /var/log/messages file
Please give me some hints what to do to reduce the iowait.
Normally the crawling machines should be able to crawl and process a few thousand documents per minute at the moment we are runnin on 9 documents per second, far away from beeing speedy.
Regards for every answer