LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   /dev/shm not behaving right (https://www.linuxquestions.org/questions/linux-newbie-8/dev-shm-not-behaving-right-4175479126/)

sysbox 09-30-2013 05:49 PM

/dev/shm not behaving right
 
I have CentOS 6.4 64-bit PC with 24 GB of memory. I typically use /dev/shm extensively for large temporary files. However, about a month ago, it started behaving wierd. When I copy large files to /dev/shm, the PC slows down and starts swapping to disk. If I continue copying large file to dev/shm, the machine would reboot.

I am trying to figure out what causes this problem, and how to fix it.

When this problem occurs, 'cat /proc/meminfo/' says MemFree is at almost zero. So, it looks like I'm running out of memory. But I'm not running any large memory processes.

When the machine reboots, MemFree shows almost 24 GB initially, but gradually goes to zero, and the 'Cached' entry increases towards 24 GB as I copy or create large files. So, it looks like the memory is being consumed by the disk cache.

But these problems all occurred suddenly about a month ago. This was right about the same time I configured a software RAID of 4 hard drives. So it's possible I changed something, but I don't remember.

Anyway, how can I figure out why this problem is occurring, or how can I figure out how to fix it. Does anyone have any ideas?

Keith Hedger 10-01-2013 11:02 AM

/dev/shm is a RAM disk so of course if you copy large files to it you will run out of memory.

sysbox 10-01-2013 11:28 AM

Quote:

Originally Posted by Keith Hedger (Post 5038047)
/dev/shm is a RAM disk so of course if you copy large files to it you will run out of memory.

Yes, I know that. The problem is that I can't copy large files to /dev/shm. For example, my /dev/shm is 18 GB in size, and the PC has 24 GB of physical memory. I should be able to copy 18 GB of files to /dev/shm. I used to be able to do that until about a month ago. Now, if I copy a single 1 GB file to /dev/shm, the machine starts swapping and crashes long before the copy is complete.

I'm trying to figure out what is causing the problem. What changed and how can I unchange it?

jpollard 10-01-2013 04:38 PM

Is /dev/shm a tmpfs mount?

If it is (which I think is likely) your system is subject to deadlock crashes.

This is especially likely if you don't have enough swap space.

With 24 GB of memory, you can only use 12 GB for a tmpfs IF and ONLY IF other users are not also using more than 12 GB of main memory (which also includes any shared memory segments, as that is what /dev/shm is supposed to be used for).

It gets even worse if you have two tmpfs mounts... each one gets a default of 12GB, so if you have /tmp mounted for tmpfs, they can trivially crash the system just by using up a lot of /tmp when you start copying those large files.

BTW, tmpfs mounts are nothing but cache memory. It can be paged out to swap, but only if there is a swap file available.

Technically, what you are doing is not what tmpfs was designed for, and as you found out, it doesn't do it very well...

It is possible the software raid is using a good bit of cache itself - it does have to compute parity (assuming raid 5), and such buffers are not shareable, so they would be allocated for use by the driver itself.

sysbox 10-02-2013 08:21 AM

Yes, /dev/shm is a tmpfs mount. I remount /dev/shm so it has 18 GB of space. There are no other users on this system. My software raid is RAID-0, so no parity bits.

Everything worked fine until a month ago. Then the same things I've been doing for 3 years stop working. Basically, even though /dev/shm has 18 GB of free space, the system crashes after I copy a single 1-GB file. So something changed. Do you know where I can look to find the answer? This is probably a deeper issue than most newbie questions.

jpollard 10-02-2013 04:36 PM

Even raid 0 has extra buffer usage - at least one per disk, and then there is read-ahead buffers, plus what ever is using the raid storage. Raid 0 provides faster I/O by spreading the data to different disks - which would put more pressure on the cache memory.

In an idle system though, things should still work.

unSpawn 10-02-2013 05:04 PM

Quote:

Originally Posted by sysbox (Post 5038581)
Everything worked fine until a month ago.

Then assess what happened a month ago in terms of (re-)configuration, software updates, etc, etc?

sysbox 10-03-2013 07:00 AM

A month ago I configured a Software RAID-0. It went fairly smoothly so I didn't change many things. But I can't remember for sure. I may have had to change something. However, there are no new files in the /etc directory tree so it's difficult to know what changed.

jpollard 10-03-2013 07:05 AM

You could always try disabling the raid (start with just not mounting it).

sysbox 10-03-2013 11:06 AM

Already tried that. I rebooted the system and never mounted the RAID, but the problem persists.

jpollard 10-03-2013 05:09 PM

Only thing left to try is to unload the raid software.

sysbox 10-04-2013 08:23 AM

I don't think I loaded any RAID software. mdadm is installed by default on the system.

Do you have suggestions on where else I can ask these question?

Thanks

jpollard 10-04-2013 08:47 AM

You can try the CentOS forums at http://www.centos.org/modules/newbb/

I'm not sure how much help they can be, but something has changed in your system (possibly due to updates) that has changed how memory is being utilized.

As I said before, tmpfs was never designed for what you are using it for - it can cause system deadlocks very easily and there is no way to prevent it other than limiting the size of tmpfs mounts.

sysbox 10-04-2013 11:13 AM

I have not done any updates sinec I initially installed the OS several months ago.

Was tmpfs not designed for this, or was /dev/shm not designed for this? Is there another way for me to utilize a large 18 GB RAMdisk filesystem? What's the proper way to do RAMdisk filesystems in Linux?

jpollard 10-04-2013 01:18 PM

It has nothing to do with /dev/shm - as the mount point for a tmpfs. /dev/shm is used to record files that describe shared memory segments. As such, using tmpfs for that never uses that much (just overhead for directories) the file sizes in the shm are the shared by all the processes connected to the segment. As shared memory segments are deallocated, these "files" disappear.

On my system (with two logins + gdm) there are 7 entries for shared memory segments (65 MB each) used by pulse audio, and one 1MB for a bunch of semaphores. Since these do occupy memory they will add to the usage of shm.

One possiblity is that these hang around longer than they should (pulse audio is not known (to me anyway) to be that well behaved about deallocations...). You can try deleting those that are not active (check with fuser).

The original ramdisk is being depreciated due to its internal limits - a fixed maximum size, non-pageable - and some kernels may not have it available (my fedora system doesn't even have the driver anymore). So putting an 18GB file in a tmpfs mount is not guaranteed to be in memory anyway - it would be paged out to disk as other usage of memory occurs.

I'm not even sure there IS a way - safely anyway. The only time I personally have used a tmpfs was for semaphore file - 0 length, quick for usage as file semaphores, and simple (even shell scripts can do that kind of locking).


All times are GMT -5. The time now is 07:01 PM.