How Many Kernels Have Garbage Collection?

Peatmoss · 11-30-2007, 06:04 PM

Hi,

My question is, in the more commonly-available Linux distros, is there any garbage collection available that is done by the memory manager (e.g. malloc/kmalloc)?

Since I'm new to Linux, I'm still not clear on the difference between a program running in User Mode and one running in the Kernel, e.g. a driver. I'm guessing there are different memory managers for each mode given that there are two different calls available, malloc (User Mode) and kmalloc (Kernel Mode).

I'd like to know mainly about memory management in User Mode. With this in mind, does the stock Linux User Mode memory manager even have garbage collection (to eliminate fragmentation) at all? If so, does this feature come automatically enabled, or does an application designer need to factor this into the design and either a) first enable it before using it, or b) use a different memory manager?

I'm happy to do the reading, but haven't found the definitive source to answer my question yet, so any links would be appreciated. Failing that, any replies to this post would be greatly appreciated.

Regards

Peatmoss/Alleria

Couling · 11-30-2007, 06:56 PM

I must stress that I'm a newby to linux, but knowing what I do of programming and opperating systems in general, a garbage collector seems highly unlikely.

Linux will be able to clean up after a process has terminated (i think). All pages of memory assigned to that process can be freed.

But while your process is still running, the kernal will have no way of knowing which memory pages are still in use by your application.

From the depths of my memory of uni... I believe running in kernal mode (in linux) has something to do with your program running as part of the kernal process rather than a seperate one. don't quote me on that.

rsashok · 11-30-2007, 07:42 PM

For starters I've done google with subject "linux memory management", and got bunch of readable pages. You might try this one:

http://www.linuxhq.com/guides/TLK/mm/memory.html

Memory management is not a trivial topic. But I don't think you really meant to know about Linux internal memory management (MM). Your question is more about garbage collection in regards to malloc()/free() pair. Some languages, my favorite C and least favorite C++, don't provide garbage collection - it is responsibility of a programmer to free allocated memory. JAVA, on another hand, does it for you, when the object is not active, it frees memory the allocated by this object.

In Linux, if you start a process, for example by running a program outputting "Hello World" you carefully crafted using tutorial, a memory page will allocated to your process, lets call it Virtual Memory (VM). Your program cannot access (write, read) other programs memory spaces, as well as other processes cannot access yours, giving you protection from dodos who don't know how to write code, and will crash their programs, while yours still be proudly outputting "Hello World" on the console. Each process has its own Virtual Memory, and it cannot access others. Just to confuse you more, I have to mention that Virtual Memoris can occupy same physical memory space, and if after reading about MM in more details on the Web you understand why, consider yourself passing the course.

Kernel is a different story - it is the brain of the system, it has its own memory space (physical), and nobody except kernel itself could touch it, thanks to the wise folks who designed Linux. But kernel manages virtual memory for the processes it creates, as a result you have to functions: malloc() and kmalloc(), for two fundamentally different memories (not exactly 'fundamentally', there are both bits and bytes, and electrons are flying between transistors, but you know what I mean...)

Now my ultimate advise, if you are serious about learning Linux then get a good book about it. I know there are plenty of good links on the Web which would give you enough info, but I hate to see people taking their laptops to the restrooms or to the bed, and that how serious people learn Linux. I found this book extremely useful:
http://www.linuxhq.com/guides/TLK/mm/memory.html

Good luck.

syg00 · 11-30-2007, 07:51 PM

And to steer this back on topic

, have a read of this. As stated/implied above, nothing is ever as easy as it may seem it could be.

Nice post rsashok, and a handy link for the overview.

Peatmoss · 11-30-2007, 08:31 PM

Hi Rshashok,

Thank you for the detailed response, it is much appreciated. After I posted that question I too did some reading and you are right, I should have asked about languages instead. That reduces the question to something I already could have answered myself >:-|, C being my favourite also. I will read the material at the link you posted, as it provides a lot of useful information about the Linux MM subsystem, and looks very detailed, but only in the morning after some coffee. If I actually did take it to bed, I'd probably find it a good insomnia cure, lol.

Some other valuable links I came across in my research are...

... a general-purpose discussion of memory issues, including Garbage Collection
http://www.memorymanagement.org/

A downloadable Garbage Collection tool which works for C and C++:
http://www.hpl.hp.com/personal/Hans_Boehm/gc/

An article which describes a Linux Enthusiast's experimentation with it - plus some background info on GC itself
http://www.linuxjournal.com/article/6679

Thanks to all who took the time to read this post and especially to you who have generously taken the time to respond.

Regards

Peatmoss/Alleria

sundialsvcs · 11-30-2007, 09:58 PM

Welcome aboard! That's why we're here!

As you discovered, the kernel is what IBM used to call "the system-control program." Its only job is to run the physical hardware, and to create the environment in which your programs run.

Within that environment, "it depends upon the language you're using." Furthermore, as with all operating-systems, languages rely upon standard libraries to perform services for them ... such as memory management. All of this stuff is executing in "user mode," that is to say, in the world created for you by the kernel.

Peatmoss · 12-03-2007, 04:58 PM

Hi Sundialsvcs

Well there's a piece of the puzzle still missing, related to what really happens when a C or C++ program/process makes a malloc() call. There are two possibilities I can think of;

1) the compiler/linker arranges for the malloc() call to have a large heap of memory for it to allocate to programs. During the life of that process, programs may be calling malloc() and free() frequently for whatever reason. If so, then fragmentation is guaranteed. (BTW, it's Fragmentation I'm mostly concerned about in this thread and when I refer to GC, it's the algorithms the collector uses to coalesce these fragments back into longer blocks of unallocated memory). If the process remained running for "too long" then it's conceivable that, without GC, memory would be so fragmented it would be impossible for the process to continue.

2) the compiler/linker simply calls an entry point in the OS and passes on the request for a chunk of memory. The OS allocates the memory and returns control to the malloc() call which, in turn, passes control back to the caller. In this scenario, if the process continually mallocs() and frees() memory, you would still get fragmentation just as in 1) above, but now it becomes the operating system's headache.

Additional Comments:
===================
In 1, the catch is that the compiler/linker must make a ruddy-good guess as to how much memory it should put into its heap and this would require it to know something about how the application/process is going to behave, so it seems very unlikely that Scenario 1 is actually the way things work. Scenario 2 seems much more likely because memory allocation is already a service provided by most O/Ss and the O/S already knows how much free memory there is in the system, etc.

I know that it is not unreasonable for an application to be left running for very long stretches of time - therefore I would expect memory to become increasingly fragmented the longer that application ran. Yet experience indicates that Linux does not suddenly terminate a well-written application because it has exhausted its available memory. This leads me to two possible conclusions:

1) There are no "well-written" applications which continually do malloc() and free() during runtime. BTW, this is how I write all my code, but I come from an embedded background using other RTOS and the only thing the processor is doing is running my application. We design it this way on purpose so all required memory is allocated during boot and then held forever more by the application. I'm guessing that at least *some* Linux applications are mallocing and freeing during runtime.

2) Applications are merrily mallocing and freeing their memory with gay abandon and it's the O/S that is performing some kind of elementary periodic defragmentation to ensure the system keeps running properly.

I'd be willing to bet that compiler's use scenario 2 (way above) and pass along malloc() requests to the O/S, so I'll re-ask my original question in this thread. Does the O/S implement some elementary periodic defrag for a process?

And now I'll take a look at the link provided by rsashok to see if the answer is buried in there somewhere. But if you know the answer to this I'd be interested in reading it here too!

Regards

Peatmoss/Alleria

rsashok · 12-03-2007, 07:14 PM

Peatmoss,

You are right with your scenario #2 - it is OS and only OS which is dealing with memory management. There are two sets of libraries: 1. compiler library 2. system library. Functions in the compiler library make calls to functions implemented in the system library. For example: fopen() - is C-library function, but it makes call to a system library function - open(). Same mechanism for calloc()/malloc() functions.

Strictly speaking you could exhaust memory allocated to your process by doing allocs, without free. In this scenario, calloc() would return NULL as a pointer, and if you use without checking it will crash your process, but rest of the system will be running still. If your process requires extraordinary amounts of memory, I am sure there is a mechanism to ask OS to give you more on the heap, but I don't know how exactly.

Regarding fragmentation. OS manages memory in pages with a fix granularity. There are few tables in the kernel with available memory and free memory. Basically, when you ask for a memory less then a page - it picks up first available in free memory table, if asking size if more then a page - it finds few sequential pages to fit the request (OS manages this tables as linked lists)

Having fixed granularity helps with fragmentation problem, but there are other tricks and algorithms (I am not an expert on this) which allow keep memory coherent and hole free.

But again, compiler doesn't have knowledge about memory management it provides only front end to the operating system (through traps, or software interrupts) and underlaying functionality.

BTW: OS also manages your stack. It is highly unlikely in Linux, but possible that you might overrun you stack. Try to write some recursive function which has some huge structure as an automatic variable and see what happens.

Peatmoss · 12-03-2007, 08:06 PM

Hi Rshashok

I've read through the link you provided in your first response. I was particularly interested to read about "the buddy system" algorithm which actively tries to reduce fragmentation when pages are returned to the free pool. What is still unclear to me is whether this happens during the life of a process, or just when one terminates. From your response I think that it probably happens during the life of a process.

Quote:

Originally Posted by rsashok

Peatmoss,
You are right with your scenario #2 - it is OS and only OS which is dealing with memory management.

Your response is slightly surprising. For instance, if it's possible for a language such as Java or Python to implement garbage collection (by which I mean defragmentation mostly) then scenario 1 seems to be the winner. Since GC is a function of language, and the language is not the OS, then the language is playing around with memory allocation in some way.

On the other hand, since all roads lead to the OS, as you say, then it makes me wonder how the language can get involved at all. An interesting contradiction, kinda like wave/particle duality I suppose.

Back to brass tacks... If I read Chapter 3, Memory Management, correctly, then:

- None/very little of that chapter applies if the CPU doesn't include an MMU

- Even with the buddy system, it's still possible for a process to fragment memory badly enough that it could receive a NULL from a call to malloc()

I am asking a lot of questions because we are planning a new design but using an existing product ported to Linux. I'm trying to ascertain whether a lot of runtime malloc/frees has the possibility of fragmenting memory so badly that the system would stop running, or run very poorly. Hence all my questions. It's been a fun bit of research and thankyou for your replies!

Peatmoss/Alleria

chrism01 · 12-03-2007, 10:15 PM

iirc, garbage collection within a lang eg C++, Java etc simply reclaims mem that 'belongs' to an obj created by the prog, that has now gone out of scope. This does not free the mem back to the OS to allocate to a diff prog, it's internal (to the prog) only.
The OS/kernel manges mem allocation for each prog (actually each process) (gc at a another level if you like).
Then again, I don't have a degree in CS, so you should check this...