LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 07-04-2011, 01:54 AM   #16
JZL240I-U
Senior Member
 
Registered: Apr 2003
Location: Germany
Distribution: openSuSE 13.1 / 12.3_64-KDE, Ubuntu 14.04, Fedora 20, Mint 17, Chakra
Posts: 3,665

Rep: Reputation: Disabled

Quote:
Originally Posted by michael@actrix View Post

md5sum /dev/zero
Hehe. Nice trick. .
 
Old 07-04-2011, 03:46 PM   #17
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Original Poster
Rep: Reputation: 0
Sure - pretty vanilla, I think:

CFLAGS=-v -m64 -Wall
CPPFLAGS=-m64 -Wall
LDFLAGS=-m64 -lpthread
Linker: Linker.o AStore.o DataLayer.o
g++ $(LDFLAGS) -o Linker Linker.o AStore.o DataLayer.o


Quote:
Originally Posted by Tinkster View Post
Could you post the compiler options and such that you've used building
your program?


Cheers,
Tink
 
Old 07-04-2011, 03:49 PM   #18
imperialbeachdude
LQ Newbie
 
Registered: Jul 2007
Posts: 25

Original Poster
Rep: Reputation: 0
Thanks htop was very informative. I think I may have another problem that I did not describe in the original question. This is a multithreaded app that reads & writes to one very large file. Each thread reads & writes to a different area of the file, but perhaps there is a common lock in Linux? That could explain the processing problem, I think. Hmm... more to work on -

Quote:
Originally Posted by michael@actrix View Post
Try htop - it displays a bar for each CPU - see if they're all high or not. There is also iotop for io.
md5sum /dev/zero
will keep a CPU busy indefinitely. Run up several and use htop to see if all the CPU's get going.
 
Old 07-04-2011, 06:38 PM   #19
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 12,309

Rep: Reputation: 1031Reputation: 1031Reputation: 1031Reputation: 1031Reputation: 1031Reputation: 1031Reputation: 1031Reputation: 1031
That was why I suggested you try 4 cores on Linux - to see if you get comparable performance to Windoze, and (perhaps) provide evidence that your design won't scale properly.

There is no "common" lock for I/O that I'm aware of, but different filesystems may use similar. Striping the file across multiple devices may help.
 
Old 07-04-2011, 07:29 PM   #20
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
The I/O patterns are very important.

If each thread does random accesses in its window, you might try using mmap() instead of file I/O. You would use msync() to commit changes to disk if some other program needs to see the changes in real time, and madvise() to provide hints to the kernel if you know what data you'll need sometime in the future. Note that you cannot use stdio here, you'll need to use open() and other <unistd.h> low-level I/O routines instead. stdio is terribly slow anyway.

If the data files are small enough to fit in RAM, use mmap().

If the per-thread windows into the data are not aligned into page-sized units, you may have serious cacheline ping-pong at the boundary regions. The delays may be surprisingly long for a machine with so many cores. You can use sysconf(_SC_PAGESIZE) to query the page size at runtime.

If you cannot align the access windows, consider doing odd and even segments in two different passes (so that each segment worked on is surrounded by fallow segments). Or consider splitting the data into multiple separate files first, at the desired boundaries.

If each thread does linear read and write passes over the data window (even simultaneously, as long as the write trails the read), and the datasets do not fit into memory, read() and write() often surpass mmap() in speed. Use very large blocks, though; I'd recommend 2MB (2097152 bytes) if possible. Use posix_fadvise() to inform the kernel about the blocks you won't use for a while, and about the blocks that are going to be read next, so that the kernel can handle the real I/O while you process the data. You might consider handing off the writes to a dedicated thread (all threads hand off writes to a single thread, using a queue). The writer thread will consume very little CPU time, so it's OK if it gets scheduled on a core used in the computations, but your worker threads are free to work on the next block.

If you do reasonably large blocks, say 65536 bytes, or larger powers of two, and you can inform the kernel sufficient time prior to actually needing the block, you should be able to keep processing data at the same time the kernel is loading new data into memory. (The page cache in the Linux kernel is pretty efficient. Amazing, if you ask me.)

Most disk systems cannot provide maximum thoroughput when small data blocks are used. You may need to use as big as two megabyte chunks (to read and write your data), to get optimum performance. This applies mostly to read() and write(), but in lesser ways to msync() too.

If you can tell us how your program walks the data while processing it, whether it updates the data, or writes results to somewhere else, and whether the worker threads access the same areas of the file(s), we might be able to suggest something more specific to try. As it is, we're talking in only very general terms here.

If you are unfamiliar with mmap(), check my example program in this post. It uses an ephemeral sparse data file (creates it, and deletes it after done), to modify a very sparse data structure spanning a terabyte, using basic mmap() techniques.
 
1 members found this post helpful.
Old 07-05-2011, 03:37 AM   #21
michael@actrix
Member
 
Registered: Jul 2003
Distribution: OpenSUSE 11.4
Posts: 55
Blog Entries: 1

Rep: Reputation: 16
I don't think locking would be an issue if you're using a native Linux file-system. Calling things like fflush() might cause problems if called too frequently. I would be tempted to use gprof or some other profiler to find out where the app is spending it's time. Perhaps strace might give some quick clues at the system call level (man strace): for example
strace -c -S time myProgram
If very large is not too very large, then /dev/shm might be a fast place to put a file, and then copy it somewhere permanent when it's finished being created.

Quote:
Originally Posted by imperialbeachdude View Post
Thanks htop was very informative. I think I may have another problem that I did not describe in the original question. This is a multithreaded app that reads & writes to one very large file. Each thread reads & writes to a different area of the file, but perhaps there is a common lock in Linux? That could explain the processing problem, I think. Hmm... more to work on -

Last edited by michael@actrix; 07-05-2011 at 03:38 AM.
 
  


Reply

Tags
maximize, performance, threads


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Five Maximum Performance Tips for SDK 3.0 LXer Syndicated Linux News 0 06-19-2008 10:21 PM
how can two process can be scheduled in two different processor in multi processor ? bishalpoudyal Programming 4 08-31-2006 02:22 PM
How to compile apps to achieve maximum performance? kornerr Linux - General 14 06-20-2005 02:00 PM
Installing Linux on a dual processor machine (only one processor detected) rocordial Linux - Hardware 1 11-27-2004 02:16 AM
Maximum Number of Directory Entries & Performance aig Linux - General 1 07-09-2004 07:36 AM


All times are GMT -5. The time now is 05:59 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration