ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I am working on a project which requires me to write a program in C. This program will build a suffix tree for a string, and at the same time it is suppose to do memory management. For example, if the string size is larger than the RAM, it will do paging strategy which will read in part of the string into the RAM. ( Is this possible? Sorry, I'm a newbie... )
So do I have to modify the Linux Kernel? Or do I make my program to access the kernel to implement the paging strategy? Or is my whole concept of memory management is wrong?
You don't have to muck about with the page tables to do memory management in C. Simply use malloc and free to allocate/deallocate memory, and let the kernel worry about when to put particular pages into memory and when to swap them. Simple, easy, and 100% portable. If you really feel like mucking with the page table, the kernel provides system calls that allow you to do so -- see mmap, mlock, mprotect, and friends.
BTW, you might want to request a mod to move this to the programming forum, as it probably fits better there.
Moved: This thread is more suitable in Programming and has been moved accordingly to help your thread/question get the exposure it deserves.
I understand you want to make a simulation of memory management, not the management itself. To do it in a very basic way you need to kernel modifications. I'd split the strings into smaller parts (let's say 4KB - page size; if they're really that long) and keep them as separate variables. Now when you memory calcualtion shows you're running out of memory you can write a numbe of them to a file and mark them, so you can readtham back when needed.
To do it in a very basic way you need to kernel modifications. I'd split the strings into smaller parts (let's say 4KB - page size; if they're really that long) and keep them as separate variables. Now when you memory calcualtion shows you're running out of memory you can write a numbe of them to a file and mark them, so you can readtham back when needed. [/B]
How do I do these? I have basic knowledge of C programming. Is it enough to do the things that you mentioned? Or do I need to learn more? If so, do you have any recommendations on websites or books that will be helpful? Thanks!
How basic is basic. Do you know how to use fopen, fclose, fread, fwrite, fseek, and fprintf to do file I/O? That's really all you need to do what Mara suggested, along with enough knowledge to account for how much memory your are using. Also, is it a requirement that you do your own memory management? If not, yo can do what I suggested a couple posts up. I'm a bit confused as to exactly what the requirements of this project are?
Yea, I need to do memory management and I got a rough idea of fopen, fseek, etc...
My project is on building Suffix Trees based on a string. The string can be DNA sequences of up to 30GB, and the built suffix tree will be about 8 to 10 times larger. Thus, the string and suffix tree data structure will not be able to store in the memory. So how do I do these?
For example, if I use fread to read in the 30Gb string and declare malloc on a pointer pointing to the string input, the pointer will be NULL, because there is not enough memory for it. I think I'll need to do paging strategy, maybe page in a section of the string and suffix tree data structure. But does Linux help me do all these? Or I will have to code these paging strategy?
The programming module that I learnt only taught me how to code, with the assumption that everything can be in the memory. Now, I'm facing the problem that the data structure will be bigger than the memory, which is totally new to me...
Then it that case your best strategy is to open up a file and write data that you don't need to use right away to it, as Mara suggested. Try to keep as little in memory as possible. If you know the basics of file I/O this should be relatively easy. Your program is going to have to keep track of what's in memory and what's is your scratch file.
The problem is, 30 GB is probably going to be bigger than memory + swap on just about every machines out there, so even trying to have the kernel move some of the pages to swap space is not likely to help, since there simply isn't enough to hold that amount of data.
I suggest to keep the data not in one variable (one pointer), but in many. In fact, you don't have a choice - you can't have 30GB structure that way.
Divide it and store pointers to the fragments you currently have in memory (you probably need to perform calculations on the data, right?). 4KB fragments will be probably too small - 64KB or maybe longer.
Now how to do it. Keep the data in a file(s). When you need a fragment, read it and process. You can keep more fragments in memory at the same time and store pointers to them with the place in data when the fragment starts). When you need information, you first look into the list (or tree) of structures keeping pointers to the loaded fragments. If you find the right one - you can process. If not, load the needed fragment and add pointer to it to your tree. Remember to check memory usage from time to time (in short; how many fragments you have loaded) and free if threre are too many.
How would the structure look like:
char *ptr; /* pointer to the fragment */
unsigned long location; /* offset of the beginning of this fragment */
More information about lists, trees, memory allocations, reading/writing to/from file can be found in any C book.