yes it's very hard to get a concurrent DMA buffer larger than 128 KB at runtime
i think it's because the memory gets fragmented
you can keep digging at the same area with kmalloc(GFP_ATOMIC)
but that seems like pretty risky buisness to me
download the samples here
http://examples.oreilly.com/linuxdrive2/
and check out the folder allocator
i don't know that much about it or what you are doing but there is a driver I/O abstraction layer
struct kiobuf from <linux/iobuf.h>
you can allocate a vector of buffers
Code:
int alloc_kiovec(int nr, struct kiobuf **iovec);
it allows you to ignore the underlying vm junk
or for block devices you got the kernel file
./drivers/char/raw.c
where you really don't go through the kernel at all