I recommend relying on the kernel for I/O caching. You should use
posix_fadvise() to tell the kernel about file access patterns, though. Most importantly, if you know you won't need a part of a file you just read, use posix_fadvise() to tell the kernel to drop it from the cache.
If you have multiple concurrent connections, I recommend using asynchronous I/O for disk access (
aio_read()), and nonblocking I/O for the socket side. You'll basically always have the next file block posted as an async read/write (you'll need space for three blocks per connection for best performance), with a signal delivered whenever a block is read and available. A single thread (for example the main process thread itself) can do the socket communication for all connections.
This way the kernel is free to decide the order of disk accesses, and can use the I/O elevator fully. Using posix_fadvise() will tell the kernel which parts of the file should be read (ahead), and which parts can be dropped from the page cache. Your program won't use up precious RAM for internal caching, so the OS has more RAM for page cache, giving it leeway for caching decisions.
On the socket side, you may need to use a memory map (
Documentation/networking/packet_mmap.txt, see
Wiki) to achieve gigabit rates with small packets (since with small packets the system call overhead becomes a limiting factor). If you map the send buffer (TX, circular) to userspace, you can construct multiple packets to multiple destinations in the buffer, and send them all at once by calling
send().